首页 > 最新文献

International Conference on Signal Processing and Machine Learning最新文献

英文 中文
A Stereo Matching with Reconstruction Network for Low-light Stereo Vision 基于重建网络的低光立体视觉匹配
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372821
Rui Tang, Geng Zhang, Xuebin Liu
To solve the problem existing in the stereo matching of low-light images, this paper proposes a stereo matching with reconstruction network based on pyramid stereo matching network(PSMNet) and reconstruction module. In view of the characteristics of the low-light image with severe and complex noise, the image reconstruction module is added into the traditional stereo matching network for automatic denoising. In this process, the image reconstruction module assists the stereo matching module in model training, so as to reduce the influence of noise on stereo matching and obtain more accurate results. The proposed method has been evaluated and achieves good performance on the Middlebury dataset which is preprocessed. In addition, a low-light binocular platform is built to get the true low-light image and test our network in night environment, results show the disparity maps are more accurate compared with previous methods.
针对低照度图像立体匹配中存在的问题,提出了一种基于金字塔立体匹配网络(PSMNet)和重建模块的立体匹配重建网络。针对低照度图像噪声严重、复杂的特点,在传统的立体匹配网络中加入图像重构模块进行自动去噪。在此过程中,图像重建模块辅助立体匹配模块进行模型训练,从而减少噪声对立体匹配的影响,获得更准确的结果。该方法在Middlebury数据集上进行了预处理,取得了良好的效果。此外,搭建了一个低光双目平台,获得了真实的低光图像,并在夜间环境下对网络进行了测试,结果表明视差图比以往的方法更准确。
{"title":"A Stereo Matching with Reconstruction Network for Low-light Stereo Vision","authors":"Rui Tang, Geng Zhang, Xuebin Liu","doi":"10.1145/3372806.3372821","DOIUrl":"https://doi.org/10.1145/3372806.3372821","url":null,"abstract":"To solve the problem existing in the stereo matching of low-light images, this paper proposes a stereo matching with reconstruction network based on pyramid stereo matching network(PSMNet) and reconstruction module. In view of the characteristics of the low-light image with severe and complex noise, the image reconstruction module is added into the traditional stereo matching network for automatic denoising. In this process, the image reconstruction module assists the stereo matching module in model training, so as to reduce the influence of noise on stereo matching and obtain more accurate results. The proposed method has been evaluated and achieves good performance on the Middlebury dataset which is preprocessed. In addition, a low-light binocular platform is built to get the true low-light image and test our network in night environment, results show the disparity maps are more accurate compared with previous methods.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"49 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133557648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Neural Network-Based Scale Feature Model for BVI Detection and Principal Component Extraction 基于深度神经网络的BVI检测及主成分提取尺度特征模型
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372813
Lu Wang, Xiaorui Liu, Xiaoqing Hu, Luyang Guan, Ming Bao
The blade-vortex interaction (BVI) is a typical helicopter noise, and has received significant attentions in the fields of structural stealth and acoustic detection. In this paper, a hybrid scheme combining aerodynamic and acoustic analysis based on the deep neural network (DNN) is proposed to achieve a better understanding of the BVI. Meanwhile, the DNN-based scale feature model (DNN-SFM) is constructed to describe the end-to-end relationship between the aero-acoustic parameters of the BVI signal and the optimal wavelet scale feature by the MZ-discrete wavelet transform. Two novel methods based on DNN-SFM are proposed for the BVI signal detection and principal component extraction, which effectively reduces the time complexity and improves the robustness in a variety of noisy environments compared to traditional algorithms. The extensive experiments on simulated and realistic data verify the effectiveness of our methods.
叶片涡相互作用(BVI)是一种典型的直升机噪声,在结构隐身和声探测领域受到广泛关注。为了更好地理解英属维尔京群岛,本文提出了一种基于深度神经网络(DNN)的空气动力学和声学分析相结合的混合方案。同时,通过mz -离散小波变换,构建基于dnn的尺度特征模型(DNN-SFM)来描述BVI信号的气动声学参数与最优小波尺度特征之间的端到端关系。提出了两种基于DNN-SFM的BVI信号检测和主成分提取新方法,与传统算法相比,有效降低了时间复杂度,提高了在各种噪声环境下的鲁棒性。大量的模拟和真实数据实验验证了我们方法的有效性。
{"title":"Deep Neural Network-Based Scale Feature Model for BVI Detection and Principal Component Extraction","authors":"Lu Wang, Xiaorui Liu, Xiaoqing Hu, Luyang Guan, Ming Bao","doi":"10.1145/3372806.3372813","DOIUrl":"https://doi.org/10.1145/3372806.3372813","url":null,"abstract":"The blade-vortex interaction (BVI) is a typical helicopter noise, and has received significant attentions in the fields of structural stealth and acoustic detection. In this paper, a hybrid scheme combining aerodynamic and acoustic analysis based on the deep neural network (DNN) is proposed to achieve a better understanding of the BVI. Meanwhile, the DNN-based scale feature model (DNN-SFM) is constructed to describe the end-to-end relationship between the aero-acoustic parameters of the BVI signal and the optimal wavelet scale feature by the MZ-discrete wavelet transform. Two novel methods based on DNN-SFM are proposed for the BVI signal detection and principal component extraction, which effectively reduces the time complexity and improves the robustness in a variety of noisy environments compared to traditional algorithms. The extensive experiments on simulated and realistic data verify the effectiveness of our methods.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129633489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition 基于骨架的动作识别的注意增强循环图卷积网络
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372814
Xiaolu Ding, Kai Yang, Wai Chen
Dynamic movements of human skeleton have attracted more and more attention as a robust modality for action recognition. As not all temporal stages and skeleton joints are informative for action recognition, and the irrelevant information often brings noise which can degrade the detection performance, extracting discriminative temporal and spatial features becomes an important task. In this paper, we propose a novel end-to-end attention-enhanced recurrent graph convolutional network (AR-GCN) for skeleton-based action recognition. An attention-enhanced mechanism is employed in AR-GCN to pay different levels of attention to different temporal stages and spatial joints. This approach overcomes the information loss caused by only using keyframes and key joints. In particular, AR-GCN combines the graph convolutional network (GCN) with the bidirectional recurrent neural network (BRNN), which retains the irregular joints expressive power of the original GCN, while promoting its sequential modeling ability by introducing a recurrent network. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.
人体骨骼动态运动作为一种鲁棒的动作识别方法越来越受到人们的关注。对于动作识别来说,并非所有的时间阶段和骨骼关节都是信息丰富的,而且不相关的信息往往会带来噪声,从而降低检测的性能,因此提取有区别的时空特征成为重要的任务。在本文中,我们提出了一种新颖的端到端注意增强循环图卷积网络(AR-GCN)用于基于骨架的动作识别。AR-GCN采用注意增强机制,对不同的时间阶段和空间关节给予不同程度的注意。这种方法克服了仅使用关键帧和关键关节所造成的信息丢失。特别是AR-GCN将图卷积网络(GCN)与双向递归神经网络(BRNN)相结合,既保留了原GCN的不规则关节表达能力,又通过引入递归网络提升了其序列建模能力。实验结果证明了我们提出的模型在广泛使用的NTU和Kinetics数据集上的有效性。
{"title":"An Attention-Enhanced Recurrent Graph Convolutional Network for Skeleton-Based Action Recognition","authors":"Xiaolu Ding, Kai Yang, Wai Chen","doi":"10.1145/3372806.3372814","DOIUrl":"https://doi.org/10.1145/3372806.3372814","url":null,"abstract":"Dynamic movements of human skeleton have attracted more and more attention as a robust modality for action recognition. As not all temporal stages and skeleton joints are informative for action recognition, and the irrelevant information often brings noise which can degrade the detection performance, extracting discriminative temporal and spatial features becomes an important task. In this paper, we propose a novel end-to-end attention-enhanced recurrent graph convolutional network (AR-GCN) for skeleton-based action recognition. An attention-enhanced mechanism is employed in AR-GCN to pay different levels of attention to different temporal stages and spatial joints. This approach overcomes the information loss caused by only using keyframes and key joints. In particular, AR-GCN combines the graph convolutional network (GCN) with the bidirectional recurrent neural network (BRNN), which retains the irregular joints expressive power of the original GCN, while promoting its sequential modeling ability by introducing a recurrent network. Experimental results demonstrate the effectiveness of our proposed model on the widely used NTU and Kinetics datasets.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122609953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Entrepreneurship and Role of AI 企业家精神和人工智能的作用
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3374910
A. Sharma
The focus on promotion of entrepreneurial activities has been always crucial for economic development of the successful nation. Entrepreneurs are the leaders who innovate and invent ideas that give stimulus to economic growth activities. In the modern era, entrepreneurship is a key determinant of sustainable growth. Literature explains different types of entrepreneurs that are dominant in explaining the economic growth. This study investigates several inhibitors of entrepreneurship in the perpetual economy of India. Study further explores the different motivators of entrepreneurship and examines the impact of those entrepreneurial motivators on economic growth and employment. A focus group interview was conducted with entrepreneurs in 2017. Now a days advancement of technology and Artificial Intelligence (AI) has touched every sphere of our life. This paper tries to focus on the impact made by AI in entrepreneurial activities. In general, factors that enrich entrepreneurship include encouraging social entrepreneurship, improving institutional environment and supports from international organisations. For growth of the country practical implications has been identified, such as improving institutional development, creating supportive business environment with e-commerce, and promoting social entrepreneurship, security.
注重促进创业活动一直是成功国家经济发展的关键。企业家是创新和发明刺激经济增长活动的想法的领导者。在现代,企业家精神是可持续增长的关键决定因素。文献解释了不同类型的企业家在解释经济增长中占主导地位。本研究调查了印度永续经济中企业家精神的几个抑制因素。研究进一步探讨了创业的不同动机,并考察了这些创业动机对经济增长和就业的影响。2017年,我们对企业家进行了焦点小组访谈。如今,科技和人工智能(AI)的进步已经触及了我们生活的方方面面。本文试图关注人工智能对创业活动的影响。总的来说,丰富企业家精神的因素包括鼓励社会企业家精神、改善制度环境和国际组织的支持。对于国家增长的实际影响已经确定,例如改善体制发展,创造支持电子商务的商业环境,促进社会企业家精神,安全。
{"title":"Entrepreneurship and Role of AI","authors":"A. Sharma","doi":"10.1145/3372806.3374910","DOIUrl":"https://doi.org/10.1145/3372806.3374910","url":null,"abstract":"The focus on promotion of entrepreneurial activities has been always crucial for economic development of the successful nation. Entrepreneurs are the leaders who innovate and invent ideas that give stimulus to economic growth activities. In the modern era, entrepreneurship is a key determinant of sustainable growth. Literature explains different types of entrepreneurs that are dominant in explaining the economic growth. This study investigates several inhibitors of entrepreneurship in the perpetual economy of India. Study further explores the different motivators of entrepreneurship and examines the impact of those entrepreneurial motivators on economic growth and employment. A focus group interview was conducted with entrepreneurs in 2017. Now a days advancement of technology and Artificial Intelligence (AI) has touched every sphere of our life. This paper tries to focus on the impact made by AI in entrepreneurial activities. In general, factors that enrich entrepreneurship include encouraging social entrepreneurship, improving institutional environment and supports from international organisations. For growth of the country practical implications has been identified, such as improving institutional development, creating supportive business environment with e-commerce, and promoting social entrepreneurship, security.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122626365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Discrete Sidelobe Clutter Determination Method Based on Filtering Response Loss 基于滤波响应损失的离散旁瓣杂波确定方法
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372812
He Wen, Chongdi Duan, Weiwei Wang, Yu Li
For air moving target detection with space-based radar (SBR), discrete sidelobe clutter is generally caused by strong scattering points at the sidelobe direction in the observation scene, which is difficult to discern from moving targets as a result of its strong power and special Doppler feature. To solve the above problem, the discrete clutter determination method based on filtering response loss is proposed. Firstly, the power of the potential target is calculated after clutter suppression, and then the power loss of the potential target is obtained by giving comparison to the initial power. Finally, in accordance with the criterion that power loss of discrete sidelobe clutter is high while that of the moving target is low after clutter suppression, with which the discrete sidelobe clutter can be identified based on the adaptive threshold. Simulation results with the real measured data show the feasibility and effectiveness of the proposed method.
在天基雷达对空中运动目标的探测中,离散旁瓣杂波通常是由观测场景中旁瓣方向的强散射点引起的,由于其强大的功率和特殊的多普勒特性,使其难以从运动目标中识别出来。针对上述问题,提出了基于滤波响应损失的离散杂波确定方法。首先计算杂波抑制后潜在目标的功率,然后通过与初始功率的比较得到潜在目标的功率损耗。最后,根据杂波抑制后离散旁瓣杂波功率损耗大而运动目标功率损耗小的准则,基于自适应阈值对离散旁瓣杂波进行识别。仿真结果表明了该方法的可行性和有效性。
{"title":"Discrete Sidelobe Clutter Determination Method Based on Filtering Response Loss","authors":"He Wen, Chongdi Duan, Weiwei Wang, Yu Li","doi":"10.1145/3372806.3372812","DOIUrl":"https://doi.org/10.1145/3372806.3372812","url":null,"abstract":"For air moving target detection with space-based radar (SBR), discrete sidelobe clutter is generally caused by strong scattering points at the sidelobe direction in the observation scene, which is difficult to discern from moving targets as a result of its strong power and special Doppler feature. To solve the above problem, the discrete clutter determination method based on filtering response loss is proposed. Firstly, the power of the potential target is calculated after clutter suppression, and then the power loss of the potential target is obtained by giving comparison to the initial power. Finally, in accordance with the criterion that power loss of discrete sidelobe clutter is high while that of the moving target is low after clutter suppression, with which the discrete sidelobe clutter can be identified based on the adaptive threshold. Simulation results with the real measured data show the feasibility and effectiveness of the proposed method.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124471928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Small-Footprint End-to-End KWS System in Low Resources 低资源下的小占用端到端KWS系统
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372822
Gui-Xin Shi, Weiqiang Zhang, Hao Wu, Yao Liu
In this paper, we propose an efficient end-to-end architecture, based on Connectionist Temporal Classification (CTC), for low-resource small-footprint keyword spotting (KWS) system. For a low-resource KWS system, it is difficult for the network to thoroughly learn the features of keywords. The intuition behind our new model is that a priori information of the keyword is available. In contrast to the conventional KWS system, we modify the label set by adding the preset keyword(s) to the original label set to enhance the learning performance and optimize the final detection task of the system. Besides, CTC is applied to address the sequential alignment problem. We employ GRU as the encoding layer in our system because of the dataset small. Experiments using the WSJ0 dataset show that the proposed KWS system is significantly more accurate than the baseline system. Compared to the character-level-only KWS system, the proposed system can obviously improve the performance. Furthermore, the improved system works well in terms of low resource conditions, especially for long words.
在本文中,我们提出了一种基于连接时间分类(CTC)的高效端到端架构,用于低资源、小占用的关键字识别(KWS)系统。对于低资源的KWS系统,网络很难完全学习到关键词的特征。我们新模型背后的直觉是,关键字的先验信息是可用的。与传统的KWS系统相比,我们对标签集进行了修改,将预设的关键字添加到原有的标签集中,从而提高了学习性能,优化了系统的最终检测任务。此外,将CTC应用于解决序列对齐问题。由于数据集较小,我们在系统中采用GRU作为编码层。使用WSJ0数据集的实验表明,所提出的KWS系统比基线系统的精度显著提高。与仅字符级的KWS系统相比,该系统可以明显提高性能。此外,改进后的系统在低资源条件下也能很好地工作,特别是对于长单词。
{"title":"A Small-Footprint End-to-End KWS System in Low Resources","authors":"Gui-Xin Shi, Weiqiang Zhang, Hao Wu, Yao Liu","doi":"10.1145/3372806.3372822","DOIUrl":"https://doi.org/10.1145/3372806.3372822","url":null,"abstract":"In this paper, we propose an efficient end-to-end architecture, based on Connectionist Temporal Classification (CTC), for low-resource small-footprint keyword spotting (KWS) system. For a low-resource KWS system, it is difficult for the network to thoroughly learn the features of keywords. The intuition behind our new model is that a priori information of the keyword is available. In contrast to the conventional KWS system, we modify the label set by adding the preset keyword(s) to the original label set to enhance the learning performance and optimize the final detection task of the system. Besides, CTC is applied to address the sequential alignment problem. We employ GRU as the encoding layer in our system because of the dataset small. Experiments using the WSJ0 dataset show that the proposed KWS system is significantly more accurate than the baseline system. Compared to the character-level-only KWS system, the proposed system can obviously improve the performance. Furthermore, the improved system works well in terms of low resource conditions, especially for long words.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Task Learning Based End-to-End Speaker Recognition 基于端到端说话人识别的多任务学习
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372818
Yuxuan Pan, Weiqiang Zhang
Recently, there has been an increasing interest in end-to-end speaker recognition that directly take raw speech waveform as input without any hand-crafted features such as FBANK and MFCC. SincNet is a recently developed novel convolutional neural network (CNN) architecture in which the filters in the first convolutional layer are set to band-pass filters (sinc functions). Experiments show that SincNet achieves a significant decrease in frame error rate (FER) than traditional CNNs and DNNs. In this paper we demonstrate how to improve the performance of SincNet using Multi-Task learning (MTL). In the proposed Sinc- Net architecture, besides the main task (speaker recognition), a phoneme recognition task is employed as an auxiliary task. The network uses sinc layers and convolutional layers as shared layers to improve the extensiveness of the network, and the outputs of shared layers are fed into two different sets of full-connected layers for classification. Our experiments, conducted on TIMIT corpora, show that the proposed architecture SincNet-MTL performs better than standard SincNet architecture in both classification error rates (CER) and convergence rate.
最近,人们对端到端的说话人识别越来越感兴趣,这种识别直接将原始语音波形作为输入,而不需要任何手工制作的功能,如FBANK和MFCC。SincNet是最近开发的一种新型卷积神经网络(CNN)架构,其中第一卷积层中的滤波器被设置为带通滤波器(sinc函数)。实验表明,与传统的cnn和dnn相比,SincNet的帧错误率(FER)显著降低。在本文中,我们演示了如何使用多任务学习(MTL)来提高SincNet的性能。在该体系结构中,除了主任务(说话人识别)外,还采用了一个音素识别任务作为辅助任务。该网络使用自卷积层和卷积层作为共享层来提高网络的广泛性,共享层的输出被馈送到两组不同的全连接层中进行分类。我们在TIMIT语料库上进行的实验表明,所提出的SincNet- mtl架构在分类错误率(CER)和收敛率方面都优于标准SincNet架构。
{"title":"Multi-Task Learning Based End-to-End Speaker Recognition","authors":"Yuxuan Pan, Weiqiang Zhang","doi":"10.1145/3372806.3372818","DOIUrl":"https://doi.org/10.1145/3372806.3372818","url":null,"abstract":"Recently, there has been an increasing interest in end-to-end speaker recognition that directly take raw speech waveform as input without any hand-crafted features such as FBANK and MFCC. SincNet is a recently developed novel convolutional neural network (CNN) architecture in which the filters in the first convolutional layer are set to band-pass filters (sinc functions). Experiments show that SincNet achieves a significant decrease in frame error rate (FER) than traditional CNNs and DNNs.\u0000 In this paper we demonstrate how to improve the performance of SincNet using Multi-Task learning (MTL). In the proposed Sinc- Net architecture, besides the main task (speaker recognition), a phoneme recognition task is employed as an auxiliary task. The network uses sinc layers and convolutional layers as shared layers to improve the extensiveness of the network, and the outputs of shared layers are fed into two different sets of full-connected layers for classification. Our experiments, conducted on TIMIT corpora, show that the proposed architecture SincNet-MTL performs better than standard SincNet architecture in both classification error rates (CER) and convergence rate.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128011521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implement AI Service into VR Training 将AI服务融入VR培训
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3374909
J. Suttor, Julian Marin, Evan Verbus, Meng Su
In this paper, we described the implementation of using a collection of AI services in IBM Watson to facilitate user interaction in a virtual reality space for training simulations. The project aims to increase the efficiency of training employees in an organization, by creating an immersive 3D VR environment tailored to a specific profession. Current training methods usually require an expert of the field to be hired in order to personally train these employees. The main goal of the project is to create a standard training environment which can be used and tailored by companies to train these employees without adding an additional cost.
在本文中,我们描述了在IBM Watson中使用人工智能服务集合的实现,以促进虚拟现实空间中用于训练模拟的用户交互。该项目旨在通过创建针对特定职业的沉浸式3D VR环境,提高组织员工培训的效率。目前的培训方法通常需要聘请该领域的专家来亲自培训这些员工。该项目的主要目标是创建一个标准的培训环境,公司可以使用和定制该环境来培训这些员工,而无需增加额外的成本。
{"title":"Implement AI Service into VR Training","authors":"J. Suttor, Julian Marin, Evan Verbus, Meng Su","doi":"10.1145/3372806.3374909","DOIUrl":"https://doi.org/10.1145/3372806.3374909","url":null,"abstract":"In this paper, we described the implementation of using a collection of AI services in IBM Watson to facilitate user interaction in a virtual reality space for training simulations. The project aims to increase the efficiency of training employees in an organization, by creating an immersive 3D VR environment tailored to a specific profession. Current training methods usually require an expert of the field to be hired in order to personally train these employees. The main goal of the project is to create a standard training environment which can be used and tailored by companies to train these employees without adding an additional cost.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115351251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Vision-based Human Action Recognition System for Moving Cameras Through Deep Learning 一种基于视觉的深度学习移动摄像机动作识别系统
Pub Date : 2019-11-27 DOI: 10.1145/3372806.3372815
Ming-Jen Chang, Jih-Tang Hsieh, C. Fang, Sei-Wang Chen
This study presents a vision-based human action recognition system using a deep learning technique. The system can recognize human actions successfully when the camera of a robot is moving toward the target person from various directions. Therefore, the proposed method is useful for the vision system of indoor mobile robots. The system uses three types of information to recognize human actions, namely, information from color videos, optical flow videos, and depth videos. First, Kinect 2.0 captures color videos and depth videos simultaneously using its RGB camera and depth sensor. Second, the histogram of oriented gradient features is extracted from the color videos, and a support vector machine is used to detect the human region. Based on the detected human region, the frames of the color video are cropped and the corresponding frames of the optical flow video are obtained using the Farnebäck method (https://docs.opencv=.org/3.4/d4/dee/ tutorial_optical_flow.html). The number of frames of these videos is then unified using a frame sampling technique. Subsequently, these three types of videos are input into three modified 3D convolutional neural networks (3D CNNs) separately. The modified 3D CNNs can extract the spatiotemporal features of human actions and recognize them. Finally, these recognition results are integrated to output the final recognition result of human actions. The proposed system can recognize 13 types of human actions, namely, drink (sit), drink (stand), eat (sit), eat (stand), read, sit down, stand up, use a computer, walk (horizontal), walk (straight), play with a phone/tablet, walk away from each other, and walk toward each other. The average human action recognition rate of 369 test human action videos was 96.4%, indicating that the proposed system is robust and efficient.
本研究利用深度学习技术,提出一种基于视觉的人体动作识别系统。当机器人的摄像头从不同方向向目标移动时,该系统可以成功识别人类的行为。因此,该方法对室内移动机器人的视觉系统具有实用价值。该系统使用三种类型的信息来识别人类的行为,分别是彩色视频、光流视频和深度视频。首先,Kinect 2.0使用RGB摄像头和深度传感器同时捕捉彩色视频和深度视频。其次,从彩色视频中提取有向梯度特征的直方图,并使用支持向量机检测人体区域;根据检测到的人体区域对彩色视频帧进行裁剪,通过Farnebäck方法(https://docs.opencv=.org/3.4/d4/dee/ tutorial_optical_flow.html)得到相应的光流视频帧。然后使用帧采样技术统一这些视频的帧数。随后,将这三种视频分别输入到三个改进的3D卷积神经网络(3D cnn)中。改进后的三维cnn可以提取人类动作的时空特征并进行识别。最后,将这些识别结果进行综合,输出最终的人体动作识别结果。该系统可以识别13种人类行为,即:喝水(坐)、喝水(站)、吃饭(坐)、吃饭(站)、阅读、坐下、站起来、使用电脑、走路(横着)、走路(直着)、玩手机/平板电脑、彼此走开、彼此走向。369个测试人体动作视频的平均人体动作识别率为96.4%,表明该系统具有鲁棒性和有效性。
{"title":"A Vision-based Human Action Recognition System for Moving Cameras Through Deep Learning","authors":"Ming-Jen Chang, Jih-Tang Hsieh, C. Fang, Sei-Wang Chen","doi":"10.1145/3372806.3372815","DOIUrl":"https://doi.org/10.1145/3372806.3372815","url":null,"abstract":"This study presents a vision-based human action recognition system using a deep learning technique. The system can recognize human actions successfully when the camera of a robot is moving toward the target person from various directions. Therefore, the proposed method is useful for the vision system of indoor mobile robots. \u0000 The system uses three types of information to recognize human actions, namely, information from color videos, optical flow videos, and depth videos. First, Kinect 2.0 captures color videos and depth videos simultaneously using its RGB camera and depth sensor. Second, the histogram of oriented gradient features is extracted from the color videos, and a support vector machine is used to detect the human region. Based on the detected human region, the frames of the color video are cropped and the corresponding frames of the optical flow video are obtained using the Farnebäck method (https://docs.opencv=.org/3.4/d4/dee/ tutorial_optical_flow.html). The number of frames of these videos is then unified using a frame sampling technique. Subsequently, these three types of videos are input into three modified 3D convolutional neural networks (3D CNNs) separately. The modified 3D CNNs can extract the spatiotemporal features of human actions and recognize them. Finally, these recognition results are integrated to output the final recognition result of human actions. \u0000 The proposed system can recognize 13 types of human actions, namely, drink (sit), drink (stand), eat (sit), eat (stand), read, sit down, stand up, use a computer, walk (horizontal), walk (straight), play with a phone/tablet, walk away from each other, and walk toward each other. The average human action recognition rate of 369 test human action videos was 96.4%, indicating that the proposed system is robust and efficient.","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124648411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Deep Activation Feature Maps for Visual Object Tracking 用于视觉对象跟踪的深度激活特征映射
Pub Date : 2018-11-28 DOI: 10.1145/3297067.3297088
Yang Li, Zhuang Miao, Jiabao Wang
Video object tracking is an important task with a broad range of applications. In this paper, we propose a novel visual tracking algorithm based on deep activation feature maps in correlation filter framework. Deep activation feature maps are generated from convolution neural network feature maps, which can discover the important part of the tracking target and overcome shape deformation and heavy occlusion. In addition, the scale variation is calculated by another correlation filter with histogram of oriented gradient (HoG) features. Moreover, we integrate the final tracking result in each frame based on the appearance model and scale model to further boost the overall tracking performance. We validate the effectiveness of our approach on a challenging benchmark, where the proposed method illustrates outstanding performance compared with the state-ofthe-art tracking algorithms
视频目标跟踪是一项具有广泛应用前景的重要任务。本文提出了一种基于相关滤波框架下深度激活特征映射的视觉跟踪算法。深度激活特征映射是由卷积神经网络特征映射生成的,它可以发现跟踪目标的重要部分,克服形状变形和严重遮挡。此外,通过另一个具有定向梯度直方图(HoG)特征的相关滤波器计算尺度变化。此外,我们基于外观模型和比例模型将最终跟踪结果整合到每一帧中,进一步提高整体跟踪性能。我们在一个具有挑战性的基准上验证了我们方法的有效性,其中所提出的方法与最先进的跟踪算法相比表现出出色的性能
{"title":"Deep Activation Feature Maps for Visual Object Tracking","authors":"Yang Li, Zhuang Miao, Jiabao Wang","doi":"10.1145/3297067.3297088","DOIUrl":"https://doi.org/10.1145/3297067.3297088","url":null,"abstract":"Video object tracking is an important task with a broad range of applications. In this paper, we propose a novel visual tracking algorithm based on deep activation feature maps in correlation filter framework. Deep activation feature maps are generated from convolution neural network feature maps, which can discover the important part of the tracking target and overcome shape deformation and heavy occlusion. In addition, the scale variation is calculated by another correlation filter with histogram of oriented gradient (HoG) features. Moreover, we integrate the final tracking result in each frame based on the appearance model and scale model to further boost the overall tracking performance. We validate the effectiveness of our approach on a challenging benchmark, where the proposed method illustrates outstanding performance compared with the state-ofthe-art tracking algorithms","PeriodicalId":340004,"journal":{"name":"International Conference on Signal Processing and Machine Learning","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114886362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Conference on Signal Processing and Machine Learning
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1