首页 > 最新文献

2022 IEEE International Conference on Signal Processing and Communications (SPCOM)最新文献

英文 中文
Employee Face Recognition Scheme Using A Common Space Mapping Approach 基于公共空间映射方法的员工面部识别方案
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840824
Arsalan Malik, H. Kusneniwar, Sandeep Joshi
In this work, we present a FaceNet based ‘two branch’ model for employee face recognition in low resolution images captured using substandard camera sensors. Our model involves a common space mapping approach using two deep convolutional neural networks (DCNNs) that map the low resolution and high resolution face images to a common space. The model is trained such that the distance between the two mapped images in the common space is minimized. Then, a logistic regression classifier is used to classify the mapped image by the identity of the employee. We show through simulations that the presented model achieves a recognition accuracy of 99.84%, 98.88%, and 95.53% on $36times 36$, $24times 24$, and $16times 16$ resolution images, respectively, for 209 subjects. Furthermore, the proposed model has less space (90 Megabytes) and computation requirements making it suitable for systems having low computing power and memory.
在这项工作中,我们提出了一个基于FaceNet的“两分支”模型,用于使用不合格的相机传感器捕获的低分辨率图像中的员工面部识别。我们的模型涉及一种公共空间映射方法,使用两个深度卷积神经网络(DCNNs)将低分辨率和高分辨率人脸图像映射到公共空间。对模型进行训练,使两个映射图像在公共空间中的距离最小。然后,使用逻辑回归分类器根据员工的身份对映射图像进行分类。通过仿真表明,该模型对209个受试者的36 × 36、24 × 24和16 × 16分辨率图像的识别准确率分别达到99.84%、98.88%和95.53%。此外,所提出的模型具有较小的空间(90兆字节)和计算需求,使其适用于计算能力和内存较低的系统。
{"title":"Employee Face Recognition Scheme Using A Common Space Mapping Approach","authors":"Arsalan Malik, H. Kusneniwar, Sandeep Joshi","doi":"10.1109/SPCOM55316.2022.9840824","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840824","url":null,"abstract":"In this work, we present a FaceNet based ‘two branch’ model for employee face recognition in low resolution images captured using substandard camera sensors. Our model involves a common space mapping approach using two deep convolutional neural networks (DCNNs) that map the low resolution and high resolution face images to a common space. The model is trained such that the distance between the two mapped images in the common space is minimized. Then, a logistic regression classifier is used to classify the mapped image by the identity of the employee. We show through simulations that the presented model achieves a recognition accuracy of 99.84%, 98.88%, and 95.53% on $36times 36$, $24times 24$, and $16times 16$ resolution images, respectively, for 209 subjects. Furthermore, the proposed model has less space (90 Megabytes) and computation requirements making it suitable for systems having low computing power and memory.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122056949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Method for Millimetre-Wave Channel Estimation for 1-bit Quantized Receivers using Low-Rank Matrix Constraints 基于低秩矩阵约束的1位量化接收机毫米波信道估计新方法
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840832
Swati Bhattacharya, K. Hari
Analog-to-digital converters (ADCs) for millimetre-wave (mmWave) systems have to operate at a very high sampling rate due to the high bandwidth involved. This leads to huge power consumption. One way to reduce the power consumption is to design low resolution ADCs - 1-bit ADCs in the extreme case. However, channel estimation in such receivers is a challenging task due to the non-linearity introduced. Previous estimation methods utilised the low-rank property of mmWave channels. This paper proposes two methods which use additional constraints of entry-wise infinity norm and angular sparsity which improves the normalised mean square error of the channel estimates, by upto 4.5 dB, for a range of signal-to-noise ratio values and various antenna configurations.
由于涉及高带宽,用于毫米波(mmWave)系统的模数转换器(adc)必须以非常高的采样率运行。这将导致巨大的电力消耗。降低功耗的一种方法是设计低分辨率adc -在极端情况下为1位adc。然而,由于引入了非线性,这种接收机的信道估计是一项具有挑战性的任务。以前的估计方法利用了毫米波信道的低秩特性。本文提出了两种方法,它们使用了入门级无限范数和角稀疏性的附加约束,可将信道估计的归一化均方误差提高4.5 dB,适用于各种信噪比值和各种天线配置。
{"title":"A Novel Method for Millimetre-Wave Channel Estimation for 1-bit Quantized Receivers using Low-Rank Matrix Constraints","authors":"Swati Bhattacharya, K. Hari","doi":"10.1109/SPCOM55316.2022.9840832","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840832","url":null,"abstract":"Analog-to-digital converters (ADCs) for millimetre-wave (mmWave) systems have to operate at a very high sampling rate due to the high bandwidth involved. This leads to huge power consumption. One way to reduce the power consumption is to design low resolution ADCs - 1-bit ADCs in the extreme case. However, channel estimation in such receivers is a challenging task due to the non-linearity introduced. Previous estimation methods utilised the low-rank property of mmWave channels. This paper proposes two methods which use additional constraints of entry-wise infinity norm and angular sparsity which improves the normalised mean square error of the channel estimates, by upto 4.5 dB, for a range of signal-to-noise ratio values and various antenna configurations.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Performance of ASR to Compute Optimal Location of Microphone 利用ASR性能计算传声器的最佳位置
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840766
K. Nathwani, Bhavya Dixit, Sunil Kumar Kopparapu
It has been observed that the measurement error in the microphone position from a fixed source location affected the room impulse response (RIR). This in turn affects the single-channel close microphone and multi-channel distant microphone speech recognition. Toward this end, we systematically study to identify the optimal location of the microphone, given an approximate and hence erroneous location of the microphone in 3D space. The primary idea is to use Monte-Carlo technique to generate a large number of random microphone positions around the erroneous microphone position and select the microphone position that results in the best performance of a general purpose automatic speech recognition (ASR). We experiment with clean and noisy speech and show that the optimal location of the microphone that achieves the best ASR performance is not only affected by noise characteristics but is also dependent on the SNR of the noise.
研究发现,固定源位置的麦克风位置测量误差会影响房间脉冲响应(RIR)。这又影响了单通道近端麦克风和多通道远端麦克风的语音识别。为此,我们系统地研究确定麦克风的最佳位置,给定麦克风在三维空间中的近似位置,因此错误的位置。其主要思想是使用蒙特卡罗技术在错误的麦克风位置周围生成大量随机麦克风位置,并选择通用自动语音识别(ASR)性能最佳的麦克风位置。我们对干净和有噪声的语音进行了实验,结果表明,实现最佳ASR性能的麦克风的最佳位置不仅受噪声特性的影响,而且还取决于噪声的信噪比。
{"title":"Using Performance of ASR to Compute Optimal Location of Microphone","authors":"K. Nathwani, Bhavya Dixit, Sunil Kumar Kopparapu","doi":"10.1109/SPCOM55316.2022.9840766","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840766","url":null,"abstract":"It has been observed that the measurement error in the microphone position from a fixed source location affected the room impulse response (RIR). This in turn affects the single-channel close microphone and multi-channel distant microphone speech recognition. Toward this end, we systematically study to identify the optimal location of the microphone, given an approximate and hence erroneous location of the microphone in 3D space. The primary idea is to use Monte-Carlo technique to generate a large number of random microphone positions around the erroneous microphone position and select the microphone position that results in the best performance of a general purpose automatic speech recognition (ASR). We experiment with clean and noisy speech and show that the optimal location of the microphone that achieves the best ASR performance is not only affected by noise characteristics but is also dependent on the SNR of the noise.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132397012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secret-Key Based Non-Coherent Signalling to Mitigate Reactive Injection Attacks 基于秘钥的非相干信令缓解反应性注入攻击
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840817
Neha Goel, Vivek Chaudhary, J. Harshan
When communicating under strong adversarial models, non-coherent signalling schemes are known to be robust when compared to their coherent counterparts. In this regime, we consider a reactive adversarial model wherein the adversary listens to the ON-OFF keying based signalling from the victim and then injects energy on the victim’s frequency only when it detects the OFF-state of the transmitter. We show that this attack model forces the receiver to witness non-zero energy levels on all the time-instants thereby degrading the error performance. To circumvent this problem, we propose a secret-key based non-coherent signalling wherein the transmitter and the receiver use a pre-shared key to pick a random energy level from a dictionary when communicating the ON-state. As a result, the adversary, will not be able to inject the appropriate energy level with probability one. For the proposed countermeasure, first we study its uncoded variant, and propose an optimization problem over the choice of the non-zero energy levels in the dictionary in order to minimize the average error-rate for the victim. Subsequently, we also propose a coded non-coherent signalling scheme, and study the choice of decoding strategies to further improve the average error over the uncoded counterpart. Through extensive simulations, we show that the proposed countermeasure assists the victim in achieving high reliability communication with nonzero rate despite the presence of a reactive adversary.
当在强对抗模型下通信时,与相干信号相比,非相干信号方案具有鲁棒性。在这种情况下,我们考虑一个反应性对抗模型,其中对手听取受害者基于开关键的信号,然后只有当它检测到发射机的关闭状态时才向受害者的频率注入能量。我们证明了这种攻击模型迫使接收器在所有时间瞬间见证非零能级,从而降低了误差性能。为了规避这个问题,我们提出了一种基于密钥的非相干信号,其中发送方和接收方在通信时使用预共享密钥从字典中选择随机能级。结果,对手,将无法以概率1注入适当的能级。针对所提出的对策,我们首先研究了它的非编码变体,并提出了一个选择字典中非零能级的优化问题,以最小化受害者的平均错误率。随后,我们还提出了一种编码的非相干信令方案,并研究了解码策略的选择,以进一步提高相对于非编码对等体的平均误差。通过大量的仿真,我们证明了所提出的对策可以帮助受害者在存在被动对手的情况下实现非零速率的高可靠性通信。
{"title":"Secret-Key Based Non-Coherent Signalling to Mitigate Reactive Injection Attacks","authors":"Neha Goel, Vivek Chaudhary, J. Harshan","doi":"10.1109/SPCOM55316.2022.9840817","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840817","url":null,"abstract":"When communicating under strong adversarial models, non-coherent signalling schemes are known to be robust when compared to their coherent counterparts. In this regime, we consider a reactive adversarial model wherein the adversary listens to the ON-OFF keying based signalling from the victim and then injects energy on the victim’s frequency only when it detects the OFF-state of the transmitter. We show that this attack model forces the receiver to witness non-zero energy levels on all the time-instants thereby degrading the error performance. To circumvent this problem, we propose a secret-key based non-coherent signalling wherein the transmitter and the receiver use a pre-shared key to pick a random energy level from a dictionary when communicating the ON-state. As a result, the adversary, will not be able to inject the appropriate energy level with probability one. For the proposed countermeasure, first we study its uncoded variant, and propose an optimization problem over the choice of the non-zero energy levels in the dictionary in order to minimize the average error-rate for the victim. Subsequently, we also propose a coded non-coherent signalling scheme, and study the choice of decoding strategies to further improve the average error over the uncoded counterpart. Through extensive simulations, we show that the proposed countermeasure assists the victim in achieving high reliability communication with nonzero rate despite the presence of a reactive adversary.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126855287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outage Performance of Optimal Relay and Antenna Selection Schemes with TAS/MRC and TAS/SC for Spectrum-Sharing Network under Imperfect CSI 频谱共享网络中TAS/MRC和TAS/SC最优中继和天线选择方案的中断性能
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840842
A. Prathyusha, P. Das
Cooperative relaying and multiple-input multiple-output transmission technologies exploit spatial diversity to improve the performance of the secondary users in an underlay spectrum sharing network. For this, we present an optimal relay and transmit antenna selection (ORTAS) scheme and an optimal relay and antenna pair selection (ORAPS) scheme by employing transmit antenna selection/maximal ratio combining and transmit antenna selection/selection combining strategies, respectively. Assuming imperfect channel knowledge of the interference links to the primary receiver, the secondary source and relays sufficiently back-off their transmit powers to satisfy an interference outage constraint. We derive closed-form expressions for the outage probability of the secondary network under non-identically distributed Rayleigh fading channels for both the schemes. We also derive insightful asymptotic outage probability expressions for the ORTAS scheme considering two distinct scenarios in order to guarantee different quality of service requirements for the primary network. Both the proposed schemes substantially outperform several other relay and antenna selection schemes, and can be served as a better performance/complexity tradeoff.
在底层频谱共享网络中,合作中继和多输入多输出传输技术利用空间分集来提高二次用户的性能。为此,我们分别采用发射天线选择/最大比组合策略和发射天线选择/选择组合策略,提出了中继和发射天线选择最优方案ORTAS和中继和天线对选择最优方案ORAPS。假设对主接收机的干扰链路的信道知识不完善,那么辅助源和中继将充分降低其发射功率以满足干扰中断约束。导出了两种方案在非同分布瑞利衰落信道下二次网络中断概率的封闭表达式。为了保证主网络的不同服务质量要求,我们还推导了考虑两种不同场景的ORTAS方案的渐近中断概率表达式。所提出的两种方案都大大优于其他几种中继和天线选择方案,并且可以作为更好的性能/复杂性权衡。
{"title":"Outage Performance of Optimal Relay and Antenna Selection Schemes with TAS/MRC and TAS/SC for Spectrum-Sharing Network under Imperfect CSI","authors":"A. Prathyusha, P. Das","doi":"10.1109/SPCOM55316.2022.9840842","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840842","url":null,"abstract":"Cooperative relaying and multiple-input multiple-output transmission technologies exploit spatial diversity to improve the performance of the secondary users in an underlay spectrum sharing network. For this, we present an optimal relay and transmit antenna selection (ORTAS) scheme and an optimal relay and antenna pair selection (ORAPS) scheme by employing transmit antenna selection/maximal ratio combining and transmit antenna selection/selection combining strategies, respectively. Assuming imperfect channel knowledge of the interference links to the primary receiver, the secondary source and relays sufficiently back-off their transmit powers to satisfy an interference outage constraint. We derive closed-form expressions for the outage probability of the secondary network under non-identically distributed Rayleigh fading channels for both the schemes. We also derive insightful asymptotic outage probability expressions for the ORTAS scheme considering two distinct scenarios in order to guarantee different quality of service requirements for the primary network. Both the proposed schemes substantially outperform several other relay and antenna selection schemes, and can be served as a better performance/complexity tradeoff.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127763949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USS Directed E2E Speech Synthesis For Indian Languages 美国指导的E2E语音合成印度语言
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840801
Sudhanshu Srivastava, H. Murthy
The state-of-the-art end-to-end (E2E) text-to-speech synthesis systems produce highly intelligible speech. But they lack the timbre of Unit Selection Synthesis (USS) and do not perform well in a low-resource environment. Moreover, the high synthesis quality of E2E is limited to read speech. But for conversational speech synthesis, we observe the problem of missing words and the creation of artifacts. On the other hand, USS not only produces the exact speech according to the text but also preserves the timbre. Combining the advantages of USS and the continuity property of E2E, this paper proposes a technique to combine the classical USS with the neural-network-based E2E system to develop a hybrid model for Indian languages.The proposed system guides the USS system using the E2E system. Syllable-based USS and character-based E2E TTS systems are built. Mel spectrograms of syllable-like units generated in the USS and E2E frameworks are compared, and the mel-spectrogram of the better unit is used in the waveglow vocoder. A dataset of 5 Indian languages is used for the experiments. DMOS scores are obtained for conversational speech utterances improperly synthesized in the vanilla E2E and USS frameworks using the Hybrid system and an average absolute improvement of 0.3 is observed over the E2E system.
最先进的端到端(E2E)文本到语音合成系统产生高度可理解的语音。但它们缺乏单元选择综合(USS)的音色,在资源匮乏的环境中表现不佳。此外,E2E的高合成质量受限于读语音。但是对于会话语音合成,我们观察到缺词和伪影产生的问题。另一方面,它既能准确地根据文本产生语音,又能保留音色。结合自适应融合的优点和端到端加密的连续性,本文提出了一种将经典自适应融合与基于神经网络的端到端加密系统相结合的技术,用于开发印度语言的混合模型。提出的系统使用端到端系统指导USS系统。建立了基于音节的USS和基于字符的E2E TTS系统。比较了在USS和E2E框架中生成的类音节单元的Mel谱图,并将较优单元的Mel谱图用于波形声码器中。实验使用了5种印度语言的数据集。对于使用Hybrid系统在普通E2E和USS框架中不正确合成的会话语音,可以获得DMOS分数,并且可以观察到比E2E系统平均绝对提高0.3。
{"title":"USS Directed E2E Speech Synthesis For Indian Languages","authors":"Sudhanshu Srivastava, H. Murthy","doi":"10.1109/SPCOM55316.2022.9840801","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840801","url":null,"abstract":"The state-of-the-art end-to-end (E2E) text-to-speech synthesis systems produce highly intelligible speech. But they lack the timbre of Unit Selection Synthesis (USS) and do not perform well in a low-resource environment. Moreover, the high synthesis quality of E2E is limited to read speech. But for conversational speech synthesis, we observe the problem of missing words and the creation of artifacts. On the other hand, USS not only produces the exact speech according to the text but also preserves the timbre. Combining the advantages of USS and the continuity property of E2E, this paper proposes a technique to combine the classical USS with the neural-network-based E2E system to develop a hybrid model for Indian languages.The proposed system guides the USS system using the E2E system. Syllable-based USS and character-based E2E TTS systems are built. Mel spectrograms of syllable-like units generated in the USS and E2E frameworks are compared, and the mel-spectrogram of the better unit is used in the waveglow vocoder. A dataset of 5 Indian languages is used for the experiments. DMOS scores are obtained for conversational speech utterances improperly synthesized in the vanilla E2E and USS frameworks using the Hybrid system and an average absolute improvement of 0.3 is observed over the E2E system.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127774363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FAR-GAN: Color-controlled Fashion Apparel Regeneration FAR-GAN:色彩控制的时尚服装再生
Pub Date : 2022-07-11 DOI: 10.1109/SPCOM55316.2022.9840795
Gaurab Bhattacharya, Kuruvilla Abraham, Nikhil Kilari, V. B. Lakshmi, J. Gubbi
Automatic fashion apparel regeneration is an important aspect for the e-commerce retailers to provide an opportunity to preview the selected dress in the desired color. This helps in improving customer satisfaction and sales. In this work, we propose FAR-GAN, a fashion apparel synthesis tool with explicit control on color. The proposed approach augments the features from the fashion apparel and its edge-map in a two-step encoding process to extract the style information. This information is controlled with the target color embedding information in the decoder. To control the color of the synthesized apparel image, we have proposed the color consistency loss. Overall, the network can be trained end-to-end without incorporating any complex sub-units and controlling the color of the choice for the synthesized product image. We have conducted extensive experiments and ablation study to showcase the performance of our model compared to several state-of-the-art methodologies. The results reflect improvement in performance and justification of our design choices.
时尚服装的自动再生是电子商务零售商提供预览所选服装所需颜色的一个重要方面。这有助于提高客户满意度和销售。在这项工作中,我们提出了一种对颜色进行明确控制的时尚服装合成工具FAR-GAN。该方法采用两步编码方法,对时尚服装及其边缘图的特征进行增强,提取风格信息。该信息由解码器中的目标颜色嵌入信息控制。为了控制合成服装图像的颜色,我们提出了颜色一致性损失。总的来说,该网络可以端到端进行训练,而不需要合并任何复杂的子单元,也不需要控制合成产品图像的颜色选择。我们进行了大量的实验和烧蚀研究,以展示我们的模型与几种最先进的方法相比的性能。结果反映了性能的改进和我们设计选择的合理性。
{"title":"FAR-GAN: Color-controlled Fashion Apparel Regeneration","authors":"Gaurab Bhattacharya, Kuruvilla Abraham, Nikhil Kilari, V. B. Lakshmi, J. Gubbi","doi":"10.1109/SPCOM55316.2022.9840795","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840795","url":null,"abstract":"Automatic fashion apparel regeneration is an important aspect for the e-commerce retailers to provide an opportunity to preview the selected dress in the desired color. This helps in improving customer satisfaction and sales. In this work, we propose FAR-GAN, a fashion apparel synthesis tool with explicit control on color. The proposed approach augments the features from the fashion apparel and its edge-map in a two-step encoding process to extract the style information. This information is controlled with the target color embedding information in the decoder. To control the color of the synthesized apparel image, we have proposed the color consistency loss. Overall, the network can be trained end-to-end without incorporating any complex sub-units and controlling the color of the choice for the synthesized product image. We have conducted extensive experiments and ablation study to showcase the performance of our model compared to several state-of-the-art methodologies. The results reflect improvement in performance and justification of our design choices.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode 基于注意力的端到端语音识别独立模式和记分模式的编码器比较
Pub Date : 2022-06-26 DOI: 10.1109/SPCOM55316.2022.9840823
Raviraj Joshi, Subodh Kumar
The streaming automatic speech recognition (ASR) models are more popular and suitable for voice-based applications. However, non-streaming models provide better performance as they look at the entire audio context. To leverage the benefits of the non-streaming model in streaming applications like voice search, it is commonly used in second pass re-scoring mode. The candidate hypothesis generated using steaming models is re-scored using a non-streaming model.In this work, we evaluate the non-streaming attention-based end-to-end ASR models on the Flipkart voice search task in both standalone and re-scoring modes. These models are based on Listen-Attend-Spell (LAS) encoder-decoder architecture. We experiment with different encoder variations based on LSTM, Transformer, and Conformer. We compare the latency requirements of these models along with their performance. Overall we show that the Transformer model offers acceptable WER with the lowest latency requirements. We report a relative WER improvement of around 16% with the second pass LAS rescoring with latency overhead under 5ms. We also highlight the importance of CNN front-end with Transformer architecture to achieve comparable word error rates (WER). Moreover, we observe that in the second pass re-scoring mode all the encoders provide similar benefits whereas the difference in performance is prominent in standalone text generation mode.
流自动语音识别(ASR)模型更受欢迎,适用于基于语音的应用。然而,非流模型提供了更好的性能,因为它们着眼于整个音频环境。为了在流应用程序(如语音搜索)中利用非流模型的优势,它通常用于第二遍重新评分模式。使用蒸汽模型生成的候选假设使用非流模型重新评分。在这项工作中,我们评估了Flipkart语音搜索任务在独立和重新评分模式下的非流媒体基于注意力的端到端ASR模型。这些模型是基于“听-参与-拼写”(LAS)编码器-解码器架构的。我们尝试了基于LSTM、Transformer和Conformer的不同编码器变体。我们比较了这些模型的延迟需求及其性能。总的来说,我们展示了Transformer模型提供了具有最低延迟需求的可接受的WER。我们报告了第二次通过LAS评分的相对WER提高了约16%,延迟开销低于5ms。我们还强调了具有Transformer架构的CNN前端对于实现可比较的单词错误率(WER)的重要性。此外,我们观察到,在第二次重新评分模式下,所有编码器都提供了类似的好处,而在独立文本生成模式下,性能差异很突出。
{"title":"On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode","authors":"Raviraj Joshi, Subodh Kumar","doi":"10.1109/SPCOM55316.2022.9840823","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840823","url":null,"abstract":"The streaming automatic speech recognition (ASR) models are more popular and suitable for voice-based applications. However, non-streaming models provide better performance as they look at the entire audio context. To leverage the benefits of the non-streaming model in streaming applications like voice search, it is commonly used in second pass re-scoring mode. The candidate hypothesis generated using steaming models is re-scored using a non-streaming model.In this work, we evaluate the non-streaming attention-based end-to-end ASR models on the Flipkart voice search task in both standalone and re-scoring modes. These models are based on Listen-Attend-Spell (LAS) encoder-decoder architecture. We experiment with different encoder variations based on LSTM, Transformer, and Conformer. We compare the latency requirements of these models along with their performance. Overall we show that the Transformer model offers acceptable WER with the lowest latency requirements. We report a relative WER improvement of around 16% with the second pass LAS rescoring with latency overhead under 5ms. We also highlight the importance of CNN front-end with Transformer architecture to achieve comparable word error rates (WER). Moreover, we observe that in the second pass re-scoring mode all the encoders provide similar benefits whereas the difference in performance is prominent in standalone text generation mode.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132631996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Channel estimation for double IRS assisted broadband single-user SISO communication 双IRS辅助宽带单用户SISO通信的信道估计
Pub Date : 2022-04-19 DOI: 10.1109/SPCOM55316.2022.9840513
Vishnu Karthikeya Gorty
In this paper, two Intelligent reflecting surfaces (double IRS) assisted single-user single input single output (SISO) communication system is considered. The cascaded channels (mobile user (MU) $rightarrow$ IRS$- 1 rightarrow$ base station (BS), MU $rightarrow$ IRS$- 2 rightarrow$ BS and MU $rightarrow$ IRS$- 1 rightarrow$ IRS$- 2 rightarrow$ BS channels) are estimated under Bayesian setting. Here, the goal is to evaluate the performance of the estimator in case of MU $rightarrow$ IRS$- 1 rightarrow$ BS and MU $rightarrow$ IRS$- 2 rightarrow$ BS channel links using Bayesian Cramér-Rao lower bound (CRLB). Without the knowledge of closed form pdf of inner product of circularly symmetric complex Gaussian (CSCG) random vectors, we cannot obtain the fisher information. Hence, by numerical computation we obtain the Bayesian CRLB. In the simulation results, we show that we can approximate the pdf of the inner product of CSCG random vectors by a Rayleigh distribution by increasing the number of elements on the IRS, which is analogous to Central Limit Theorem (CLT). Also, the results convey that the mean squared error (MSE) almost matches with the Bayesian CRLB.
研究了双智能反射面(双IRS)辅助单用户单输入单输出(SISO)通信系统。在贝叶斯设置下估计了级联信道(移动用户(MU) $rightarrow$ IRS$- 1 rightarrow$基站(BS)、MU $rightarrow$ IRS$- 2 rightarrow$ BS和MU $rightarrow$ IRS$- 1 rightarrow$ IRS$- 2 rightarrow$ BS信道)。在这里,目标是使用贝叶斯cram - rao下界(CRLB)来评估MU $rightarrow$ IRS$- 1 rightarrow$ BS和MU $rightarrow$ IRS$- 2 rightarrow$ BS信道链接情况下估计器的性能。如果不知道圆对称复高斯(CSCG)随机向量内积的封闭形式pdf,就无法获得fisher信息。因此,通过数值计算,我们得到了贝叶斯CRLB。仿真结果表明,通过增加IRS上的元素个数,可以近似地得到CSCG随机向量内积的pdf,这与中心极限定理(CLT)类似。此外,结果表明,均方误差(MSE)与贝叶斯CRLB基本匹配。
{"title":"Channel estimation for double IRS assisted broadband single-user SISO communication","authors":"Vishnu Karthikeya Gorty","doi":"10.1109/SPCOM55316.2022.9840513","DOIUrl":"https://doi.org/10.1109/SPCOM55316.2022.9840513","url":null,"abstract":"In this paper, two Intelligent reflecting surfaces (double IRS) assisted single-user single input single output (SISO) communication system is considered. The cascaded channels (mobile user (MU) $rightarrow$ IRS$- 1 rightarrow$ base station (BS), MU $rightarrow$ IRS$- 2 rightarrow$ BS and MU $rightarrow$ IRS$- 1 rightarrow$ IRS$- 2 rightarrow$ BS channels) are estimated under Bayesian setting. Here, the goal is to evaluate the performance of the estimator in case of MU $rightarrow$ IRS$- 1 rightarrow$ BS and MU $rightarrow$ IRS$- 2 rightarrow$ BS channel links using Bayesian Cramér-Rao lower bound (CRLB). Without the knowledge of closed form pdf of inner product of circularly symmetric complex Gaussian (CSCG) random vectors, we cannot obtain the fisher information. Hence, by numerical computation we obtain the Bayesian CRLB. In the simulation results, we show that we can approximate the pdf of the inner product of CSCG random vectors by a Rayleigh distribution by increasing the number of elements on the IRS, which is analogous to Central Limit Theorem (CLT). Also, the results convey that the mean squared error (MSE) almost matches with the Bayesian CRLB.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126068953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Feedback Capacity-Achieving Coding Scheme for the (d, ∞)-RLL Input-Constrained Binary Erasure Channel (d,∞)-RLL输入约束二进制擦除信道的反馈容量实现编码方案
Pub Date : 2022-04-14 DOI: 10.48550/arXiv.2204.06780
V. Rameshwar, N. Kashyap
This paper considers the memoryless input-constrained binary erasure channel (BEC). The channel input constraint is the $(d, infty)$-runlength limited (RLL) constraint, which mandates that any pair of successive ls in the input sequence be separated by at least d Os. We consider a scenario where there is causal, noiseless feedback from the decoder. We demonstrate a simple, labelling-based, zero-error feedback coding scheme, which we prove to be feedback capacity-achieving, and, as a by-product, obtain an explicit characterization of the feedback capacity. Our proof is based on showing that the rate of our feedback coding scheme equals an upper bound on the feedback capacity derived using the single-letter bounding techniques of Sabag et al. (2017). Moreoever, using the tools of Thangaraj (2017), we show numerically that there is a gap between the feedback and non-feedback capacities of the $(d, infty)$-RLL input constrained BEC, at least for $d=1$, 2.
本文研究无记忆输入约束二进制擦除信道(BEC)。通道输入约束是$(d, infty)$ -runlength limited (RLL)约束,它要求输入序列中任何一对连续的l必须被至少d个o隔开。我们考虑一种场景,其中有来自解码器的因果无噪声反馈。我们展示了一个简单的,基于标记的,零错误的反馈编码方案,我们证明了反馈容量的实现,并且,作为副产品,获得了反馈容量的明确表征。我们的证明是基于表明我们的反馈编码方案的速率等于使用Sabag等人(2017)的单字母边界技术得出的反馈容量的上界。此外,使用Thangaraj(2017)的工具,我们在数值上表明$(d, infty)$ -RLL输入约束BEC的反馈和非反馈能力之间存在差距,至少对于$d=1$, 2。
{"title":"A Feedback Capacity-Achieving Coding Scheme for the (d, ∞)-RLL Input-Constrained Binary Erasure Channel","authors":"V. Rameshwar, N. Kashyap","doi":"10.48550/arXiv.2204.06780","DOIUrl":"https://doi.org/10.48550/arXiv.2204.06780","url":null,"abstract":"This paper considers the memoryless input-constrained binary erasure channel (BEC). The channel input constraint is the $(d, infty)$-runlength limited (RLL) constraint, which mandates that any pair of successive ls in the input sequence be separated by at least d Os. We consider a scenario where there is causal, noiseless feedback from the decoder. We demonstrate a simple, labelling-based, zero-error feedback coding scheme, which we prove to be feedback capacity-achieving, and, as a by-product, obtain an explicit characterization of the feedback capacity. Our proof is based on showing that the rate of our feedback coding scheme equals an upper bound on the feedback capacity derived using the single-letter bounding techniques of Sabag et al. (2017). Moreoever, using the tools of Thangaraj (2017), we show numerically that there is a gap between the feedback and non-feedback capacities of the $(d, infty)$-RLL input constrained BEC, at least for $d=1$, 2.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116415606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Signal Processing and Communications (SPCOM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1