首页 > 最新文献

2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)最新文献

英文 中文
Research on Image Liquid Level Measurement Technology Based on Hough Transform 基于霍夫变换的图像液位测量技术研究
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520745
Yanqing Fu, Yongqing Peng, P. Liu, Weikui Wang
Based on the camera imaging model and the storage tank measurement environment, the corresponding relationship between the real liquid level height in the image and the liquid level contour radius is deduced in this paper. For the collected images, grayscale and morphological processing are performed first, and the Sobel operator is used for edge recognition. According to the situation, the corrosion calculation is used to reduce the subsequent calculation, and then the circle is detected by the optimized Hough transform to obtain the radius of the circle. Finally, the liquid level information is obtained according to the measurement model. Experimental results show that the maximum absolute error of the system is 2.99mm, and the maximum reference error is 0.75%. The system has certain theoretical and practical significance.
基于摄像机成像模型和储罐测量环境,推导了图像中真实液位高度与液位轮廓半径的对应关系。对采集到的图像,首先进行灰度和形态学处理,然后利用Sobel算子进行边缘识别。根据情况,采用腐蚀计算减少后续计算,然后利用优化后的霍夫变换检测圆,得到圆的半径。最后根据测量模型得到液位信息。实验结果表明,系统的最大绝对误差为2.99mm,最大参考误差为0.75%。该系统具有一定的理论和现实意义。
{"title":"Research on Image Liquid Level Measurement Technology Based on Hough Transform","authors":"Yanqing Fu, Yongqing Peng, P. Liu, Weikui Wang","doi":"10.1109/PRML52754.2021.9520745","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520745","url":null,"abstract":"Based on the camera imaging model and the storage tank measurement environment, the corresponding relationship between the real liquid level height in the image and the liquid level contour radius is deduced in this paper. For the collected images, grayscale and morphological processing are performed first, and the Sobel operator is used for edge recognition. According to the situation, the corrosion calculation is used to reduce the subsequent calculation, and then the circle is detected by the optimized Hough transform to obtain the radius of the circle. Finally, the liquid level information is obtained according to the measurement model. Experimental results show that the maximum absolute error of the system is 2.99mm, and the maximum reference error is 0.75%. The system has certain theoretical and practical significance.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"29 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123873304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Driver’s Illegal Driving Behavior Detection with SSD Approach 基于SSD方法的驾驶员非法驾驶行为检测
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520735
Tao Yang, Jin Yang, Jicheng Meng
In this paper, an advanced detection approach of illegal driving behavior is proposed using Single Shot MultiBox Detector (SSD) based on deep learning. The detection of driver’s illegal driving behavior includes cellphone usage, cigarette smoke and no fastening seat belt. Doing this can greatly reduce the occurrence of traffic accidents. In order to validate the detection effect using SSD on small target objects, such as cigarette in complex environment, we use not only three online databases, i.e. HMDB human motion database, WIDER FACE Database, Hollywood-2 Database, but also a real database collected by ourselves. The experimental results show that the SSD approach has a better performance than the Faster Regions with Convolutional Neural Network (Faster R-CNN) for detecting driver’s illegal driving behavior.
本文提出了一种基于深度学习的单镜头多盒检测器(Single Shot MultiBox Detector, SSD)的高级违章驾驶行为检测方法。对驾驶员非法驾驶行为的检测包括使用手机、吸烟和未系安全带。这样做可以大大减少交通事故的发生。为了验证SSD在复杂环境下对香烟等小目标物体的检测效果,我们不仅使用了HMDB人体运动数据库、WIDER FACE数据库、Hollywood-2数据库这三个在线数据库,而且还使用了我们自己采集的真实数据库。实验结果表明,SSD方法在检测驾驶员违章驾驶行为方面比Faster - R-CNN方法具有更好的性能。
{"title":"Driver’s Illegal Driving Behavior Detection with SSD Approach","authors":"Tao Yang, Jin Yang, Jicheng Meng","doi":"10.1109/PRML52754.2021.9520735","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520735","url":null,"abstract":"In this paper, an advanced detection approach of illegal driving behavior is proposed using Single Shot MultiBox Detector (SSD) based on deep learning. The detection of driver’s illegal driving behavior includes cellphone usage, cigarette smoke and no fastening seat belt. Doing this can greatly reduce the occurrence of traffic accidents. In order to validate the detection effect using SSD on small target objects, such as cigarette in complex environment, we use not only three online databases, i.e. HMDB human motion database, WIDER FACE Database, Hollywood-2 Database, but also a real database collected by ourselves. The experimental results show that the SSD approach has a better performance than the Faster Regions with Convolutional Neural Network (Faster R-CNN) for detecting driver’s illegal driving behavior.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127566072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Detection in Tibetan Ancient Books: A Benchmark 藏文古籍文本检测:一个标杆
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520727
Xiangxiang Zhi, Dingguo Gao, Qijun Zhao, Shuiwang Li, Ci Qu
The digitization of Tibetan ancient books is of great significance to the preservation of Tibetan culture. This problem involves two tasks: Tibetan text detection and Tibetan text recognition. The former is undoubtedly crucial to automatic Tibetan text recognition. However, there are few works on Tibetan text detection, and lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we introduce the TxTAB dataset for evaluating text detection methods in Tibetan ancient books. The dataset is established based upon 202 treasured handwritten ancient Tibetan text images and is densely annotated with a multi-point annotation method without limiting the number of points. This is a challenging dataset with good diversity. It contains blurred images, gray and color images, the text of different sizes, the text of different handwriting styles, etc. An extensive experimental evaluation of 3 state-of-the-art text detection algorithms on TxTAB is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements particularly for detecting Tibetan text in images of low quality.
西藏古籍数字化对西藏文化的保护具有重要意义。该问题涉及两个任务:藏文文本检测和藏文文本识别。前者对于自动藏文识别无疑是至关重要的。然而,藏文文本检测方面的工作很少,训练数据的缺乏一直是一个问题,特别是对于需要大量训练数据的深度学习方法。本文引入TxTAB数据集,对藏文古籍文本检测方法进行评价。该数据集基于202幅珍贵的藏文古手写体图像建立,采用不限制点数的多点标注方法进行密集标注。这是一个具有良好多样性的具有挑战性的数据集。它包含模糊图像,灰度和彩色图像,不同大小的文字,不同笔迹风格的文字等。对TxTAB上3种最先进的文本检测算法进行了广泛的实验评估,并进行了详细的分析,结果表明,特别是在低质量图像中的藏文检测方面,仍有很大的改进空间。
{"title":"Text Detection in Tibetan Ancient Books: A Benchmark","authors":"Xiangxiang Zhi, Dingguo Gao, Qijun Zhao, Shuiwang Li, Ci Qu","doi":"10.1109/PRML52754.2021.9520727","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520727","url":null,"abstract":"The digitization of Tibetan ancient books is of great significance to the preservation of Tibetan culture. This problem involves two tasks: Tibetan text detection and Tibetan text recognition. The former is undoubtedly crucial to automatic Tibetan text recognition. However, there are few works on Tibetan text detection, and lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we introduce the TxTAB dataset for evaluating text detection methods in Tibetan ancient books. The dataset is established based upon 202 treasured handwritten ancient Tibetan text images and is densely annotated with a multi-point annotation method without limiting the number of points. This is a challenging dataset with good diversity. It contains blurred images, gray and color images, the text of different sizes, the text of different handwriting styles, etc. An extensive experimental evaluation of 3 state-of-the-art text detection algorithms on TxTAB is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements particularly for detecting Tibetan text in images of low quality.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks 基于变压器的改进神经网络多模态语音情感识别
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520692
Rutherford Agbeshi Patamia, Wu Jin, Kingsley Nketia Acheampong, K. Sarpong, Edwin Kwadwo Tenagyei
With the procession of technology, the human-machine interaction research field is in growing need of robust automatic emotion recognition systems. Building machines that interact with humans by comprehending emotions paves the way for developing systems equipped with human-like intelligence. Previous architecture in this field often considers RNN models. However, these models are unable to learn in-depth contextual features intuitively. This paper proposes a transformer-based model that utilizes speech data instituted by previous works, alongside text and mocap data, to optimize our emotional recognition system’s performance. Our experimental result shows that the proposed model outperforms the previous state-of-the-art. The IEMOCAP dataset supported the entire experiment.
随着技术的进步,人机交互研究领域越来越需要鲁棒的自动情感识别系统。制造能够理解人类情感并与人类互动的机器,为开发具有类人智能的系统铺平了道路。该领域以前的体系结构通常考虑RNN模型。然而,这些模型无法直观地学习深入的上下文特征。本文提出了一种基于转换器的模型,该模型利用先前工作建立的语音数据以及文本和动作捕捉数据来优化我们的情感识别系统的性能。实验结果表明,所提出的模型优于现有的模型。IEMOCAP数据集支持整个实验。
{"title":"Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks","authors":"Rutherford Agbeshi Patamia, Wu Jin, Kingsley Nketia Acheampong, K. Sarpong, Edwin Kwadwo Tenagyei","doi":"10.1109/PRML52754.2021.9520692","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520692","url":null,"abstract":"With the procession of technology, the human-machine interaction research field is in growing need of robust automatic emotion recognition systems. Building machines that interact with humans by comprehending emotions paves the way for developing systems equipped with human-like intelligence. Previous architecture in this field often considers RNN models. However, these models are unable to learn in-depth contextual features intuitively. This paper proposes a transformer-based model that utilizes speech data instituted by previous works, alongside text and mocap data, to optimize our emotional recognition system’s performance. Our experimental result shows that the proposed model outperforms the previous state-of-the-art. The IEMOCAP dataset supported the entire experiment.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121616848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Generation and Transformation Invariant Learning for Tomato Disease Classification 番茄病害分类的生成与转换不变量学习
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520693
Getinet Yilma, Kumie Gedamu, Maregu Assefa, Ariyo Oluwasanmi, Zhiguang Qin
Deep learning-based plant disease management became a cost-effective way to improved agro-productivity. Advanced train sample generation and augmentation methods enlarge train sample size and improve feature distribution but generation and augmentation introduced sample feature discrepancy due to the generation learning process and augmentation artificial bias. We proposed a generation and geometric transformation invariant feature learning method using Siamese networks with maximum mean discrepancy loss to minimize the feature distribution discrepancies coming from the generated and augmented samples. Through variational GAN and geometric transformation, we created four dataset settings to train the proposed approach. The abundant evaluation results on the PlantVillage tomato dataset demonstrated the effectiveness of the proposed approach for the ResNet50 Siamese networks in learning generation and transformation invariant features for plant disease classification.
基于深度学习的植物病害管理成为提高农业生产力的一种经济有效的方法。先进的训练样本生成和增强方法扩大了训练样本的规模,改善了特征分布,但由于生成学习过程和增强的人为偏差,生成和增强引入了样本特征差异。我们提出了一种基于Siamese网络的生成和几何变换不变特征学习方法,该方法具有最大的平均差异损失,以最小化来自生成和增强样本的特征分布差异。通过变分GAN和几何变换,我们创建了四个数据集设置来训练所提出的方法。在PlantVillage番茄数据集上的大量评估结果表明,本文提出的方法在ResNet50 Siamese网络学习生成和转换植物病害分类的不变特征方面是有效的。
{"title":"Generation and Transformation Invariant Learning for Tomato Disease Classification","authors":"Getinet Yilma, Kumie Gedamu, Maregu Assefa, Ariyo Oluwasanmi, Zhiguang Qin","doi":"10.1109/PRML52754.2021.9520693","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520693","url":null,"abstract":"Deep learning-based plant disease management became a cost-effective way to improved agro-productivity. Advanced train sample generation and augmentation methods enlarge train sample size and improve feature distribution but generation and augmentation introduced sample feature discrepancy due to the generation learning process and augmentation artificial bias. We proposed a generation and geometric transformation invariant feature learning method using Siamese networks with maximum mean discrepancy loss to minimize the feature distribution discrepancies coming from the generated and augmented samples. Through variational GAN and geometric transformation, we created four dataset settings to train the proposed approach. The abundant evaluation results on the PlantVillage tomato dataset demonstrated the effectiveness of the proposed approach for the ResNet50 Siamese networks in learning generation and transformation invariant features for plant disease classification.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"39 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126747345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DeepComp: A Deep Comparator for Improving Facial Age-Group Estimation DeepComp:一种改进面部年龄组估计的深度比较器
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520698
Ebenezer Nii Ayi Hammond, Shijie Zhou, Hongrong Cheng, Qihe Liu
We introduce an age-group estimation scheme known as DeepComp. It is a combination of an Early Information-Sharing Feature Aggregation (EISFA) mechanism and a ternary classifier. The EISFA part is a feature extractor that applies a siamese layer to input images and an aggregation module that sums up all the images. The ternary process compares the image representations into three possible outcomes corresponding to younger, similar, or older. From the comparisons, we arrive at a score indicating the similarity between an input and reference images: the higher the score, the closer the similarity. Experimentation shows that our DeepComp scheme achieves an impressive 94.9% accuracy on the Adience benchmark dataset using a minimum number of reference images per age group. Moreover, we demonstrate the generality of our method on the MORPH II dataset, and the result is equally impressive. Altogether, we show that, among other schemes, our method exemplifies facial age-group estimation.
我们引入了一个被称为DeepComp的年龄组估计方案。它是早期信息共享特征聚合(EISFA)机制和三元分类器的结合。EISFA部分是一个特征提取器,它将一个连体层应用于输入图像和一个汇总所有图像的聚合模块。三进制过程将图像表示比较为三种可能的结果,分别是年轻、相似或更老。通过比较,我们得出一个分数,表示输入图像和参考图像之间的相似性:分数越高,相似度越近。实验表明,我们的DeepComp方案使用每个年龄组最少数量的参考图像,在受众基准数据集上实现了令人印象深刻的94.9%准确率。此外,我们在MORPH II数据集上展示了我们的方法的通用性,结果同样令人印象深刻。总之,我们表明,在其他方案中,我们的方法是面部年龄组估计的例证。
{"title":"DeepComp: A Deep Comparator for Improving Facial Age-Group Estimation","authors":"Ebenezer Nii Ayi Hammond, Shijie Zhou, Hongrong Cheng, Qihe Liu","doi":"10.1109/PRML52754.2021.9520698","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520698","url":null,"abstract":"We introduce an age-group estimation scheme known as DeepComp. It is a combination of an Early Information-Sharing Feature Aggregation (EISFA) mechanism and a ternary classifier. The EISFA part is a feature extractor that applies a siamese layer to input images and an aggregation module that sums up all the images. The ternary process compares the image representations into three possible outcomes corresponding to younger, similar, or older. From the comparisons, we arrive at a score indicating the similarity between an input and reference images: the higher the score, the closer the similarity. Experimentation shows that our DeepComp scheme achieves an impressive 94.9% accuracy on the Adience benchmark dataset using a minimum number of reference images per age group. Moreover, we demonstrate the generality of our method on the MORPH II dataset, and the result is equally impressive. Altogether, we show that, among other schemes, our method exemplifies facial age-group estimation.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126794235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cardiac Arrhythmia Recognition Using Transfer Learning with a Pre-trained DenseNet 使用迁移学习和预训练DenseNet识别心律失常
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520710
Hadaate Ullah, Yuxiang Bu, T. Pan, M. Gao, Sajjatul Islam, Yuan Lin, Dakun Lai
Recent findings demonstrated that deep neural networks carry out features extraction itself to identify the electrocardiography (ECG) pattern or cardiac arrhythmias from the ECG signals directly and provided good results compared to cardiologists in some cases. But, to face the challenge of huge volume of data to train such networks, transfer learning is a prospective mechanism where network is trained on a large dataset and learned experiences are transferred to a small volume target dataset. Therefore, we firstly extracted 78,999 ECG beats from MIT-BIH arrhythmia dataset and transformed into 2D RGB images and used as the inputs of the DenseNet. The DenseNet is initialized with the trained weights on ImageNet and fine-tuned with the extracted beat images. Optimization of the pre-trained DenseNet is performed with the aids of on-the-fly augmentation, weighted random sampler, and Adam optimizer. The performance of the pre-trained model is assessed by hold-out evaluation and stratified 5-fold cross-validation techniques along with early stopping feature. The achieved accuracy of identifying normal and four arrhythmias are of 98.90% and 100% for the hold-out and stratified 5-fold respectively. The effectiveness of the pre-trained model with the stratified 5-fold by transfer learning approach is surpassed compared to the state-of-art-the approaches and models, and also explicit the maximum generalization of imbalanced classes.
最近的研究结果表明,深度神经网络本身进行特征提取,直接从心电信号中识别心电图(ECG)模式或心律失常,并且在某些情况下与心脏病专家相比取得了良好的效果。但是,面对海量数据训练网络的挑战,迁移学习是一种很有前景的机制,在大数据集上训练网络,并将学习到的经验转移到小容量目标数据集上。因此,我们首先从MIT-BIH心律失常数据集中提取了78,999次心电跳动,并将其转换为二维RGB图像作为DenseNet的输入。DenseNet使用ImageNet上训练的权重初始化,并使用提取的beat图像进行微调。通过实时增强、加权随机采样器和Adam优化器对预训练的DenseNet进行优化。预训练模型的性能通过hold-out评估和分层5-fold交叉验证技术以及早期停止特征进行评估。正常心律失常的识别准确率为98.90%,分层5倍心律失常的识别准确率为100%。与最先进的方法和模型相比,使用迁移学习方法的分层5倍预训练模型的有效性被超越,并且还明确了不平衡类的最大泛化。
{"title":"Cardiac Arrhythmia Recognition Using Transfer Learning with a Pre-trained DenseNet","authors":"Hadaate Ullah, Yuxiang Bu, T. Pan, M. Gao, Sajjatul Islam, Yuan Lin, Dakun Lai","doi":"10.1109/PRML52754.2021.9520710","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520710","url":null,"abstract":"Recent findings demonstrated that deep neural networks carry out features extraction itself to identify the electrocardiography (ECG) pattern or cardiac arrhythmias from the ECG signals directly and provided good results compared to cardiologists in some cases. But, to face the challenge of huge volume of data to train such networks, transfer learning is a prospective mechanism where network is trained on a large dataset and learned experiences are transferred to a small volume target dataset. Therefore, we firstly extracted 78,999 ECG beats from MIT-BIH arrhythmia dataset and transformed into 2D RGB images and used as the inputs of the DenseNet. The DenseNet is initialized with the trained weights on ImageNet and fine-tuned with the extracted beat images. Optimization of the pre-trained DenseNet is performed with the aids of on-the-fly augmentation, weighted random sampler, and Adam optimizer. The performance of the pre-trained model is assessed by hold-out evaluation and stratified 5-fold cross-validation techniques along with early stopping feature. The achieved accuracy of identifying normal and four arrhythmias are of 98.90% and 100% for the hold-out and stratified 5-fold respectively. The effectiveness of the pre-trained model with the stratified 5-fold by transfer learning approach is surpassed compared to the state-of-art-the approaches and models, and also explicit the maximum generalization of imbalanced classes.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Research on the Methods of Speech Synthesis Technology 语音合成技术方法研究
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520718
Jinyao Hu, A. Hamdulla
An important technology to realize human computer interaction is the technology of converting a given text into natural speech, that is speech synthesis. This paper succinctly expounds the development process of speech synthesis, analyzes the shortcomings of traditional speech synthesis technology, and highlights the advantages and disadvantages of various vocoders. Due to the overwhelming contribution of deep learning to the field of speech synthesis, this paper introduces several pioneering research results in the field of speech synthesis, expounds its main ideas, advantages and disadvantages, and inspires new ideas on this basis. Finally, it objectively discusses and analyzes the problems of speech synthesis technology and puts forward the direction that can be further studied.
实现人机交互的一项重要技术是将给定文本转换为自然语音的技术,即语音合成。本文简要阐述了语音合成的发展历程,分析了传统语音合成技术的不足,并着重介绍了各种声码器的优缺点。由于深度学习在语音合成领域的巨大贡献,本文介绍了语音合成领域的几项开创性研究成果,阐述了其主要思想、优缺点,并在此基础上激发新的思路。最后,对语音合成技术存在的问题进行了客观的讨论和分析,并提出了进一步研究的方向。
{"title":"Research on the Methods of Speech Synthesis Technology","authors":"Jinyao Hu, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520718","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520718","url":null,"abstract":"An important technology to realize human computer interaction is the technology of converting a given text into natural speech, that is speech synthesis. This paper succinctly expounds the development process of speech synthesis, analyzes the shortcomings of traditional speech synthesis technology, and highlights the advantages and disadvantages of various vocoders. Due to the overwhelming contribution of deep learning to the field of speech synthesis, this paper introduces several pioneering research results in the field of speech synthesis, expounds its main ideas, advantages and disadvantages, and inspires new ideas on this basis. Finally, it objectively discusses and analyzes the problems of speech synthesis technology and puts forward the direction that can be further studied.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114939724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Mixing and Separation Method of Signals + Color Images Based on Two-Dimensional CCA 一种基于二维CCA的信号+彩色图像混合分离方法
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520716
C. Kexin, Fan Liya, Yang Jing
Blind Source Separation (BSS) is a traditional and challenging problem in signal processing, in which the mixed signals can be separated according to the independence of source signals. The one-dimensional CCA-based signal and color image mixing and separation method needs to reshape the image into vector data, which destroys the spatial structure of the image and affects the recovery effect of the color image. To this end, a mixing and separation method of signals + color images based on two-dimensional CCA, in this paper, is proposed. This method utilizes the auto-correlation among original color images and signals to recover signals and images with high qualities. Comparative experiments with one-dimensional CCA on the COIL-100 data set show that the proposed method is effective and high-speed.
盲源分离(BSS)是信号处理中的一个传统问题,也是一个具有挑战性的问题,它可以根据源信号的独立性对混合信号进行分离。基于一维ca的信号与彩色图像混合分离方法需要将图像重塑为矢量数据,破坏了图像的空间结构,影响了彩色图像的恢复效果。为此,本文提出了一种基于二维CCA的信号+彩色图像混合分离方法。该方法利用原始彩色图像和信号之间的自相关,恢复出高质量的信号和图像。在COIL-100数据集上与一维CCA的对比实验表明,该方法是有效且高速的。
{"title":"A Mixing and Separation Method of Signals + Color Images Based on Two-Dimensional CCA","authors":"C. Kexin, Fan Liya, Yang Jing","doi":"10.1109/PRML52754.2021.9520716","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520716","url":null,"abstract":"Blind Source Separation (BSS) is a traditional and challenging problem in signal processing, in which the mixed signals can be separated according to the independence of source signals. The one-dimensional CCA-based signal and color image mixing and separation method needs to reshape the image into vector data, which destroys the spatial structure of the image and affects the recovery effect of the color image. To this end, a mixing and separation method of signals + color images based on two-dimensional CCA, in this paper, is proposed. This method utilizes the auto-correlation among original color images and signals to recover signals and images with high qualities. Comparative experiments with one-dimensional CCA on the COIL-100 data set show that the proposed method is effective and high-speed.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134405089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Assessment of Facial Paralysis Based on Facial Landmarks 基于面部标志的面瘫自动评估
Pub Date : 2021-07-16 DOI: 10.1109/PRML52754.2021.9520746
Yuxi Liu, Zhimin Xu, L. Ding, Jie Jia, Xiaomei Wu
Unilateral peripheral facial paralysis is the most common case of facial paralysis. It affects only one side of the face, which will cause facial asymmetry. Clinically, unilateral peripheral facial paralysis is often classified by clinicians according to evaluation scales, based on patients’ condition of facial symmetry. A prevalent scale is House-Brackmann grading system (HBGS). However, assessment results from scales are often with great subjectivity, and will bring high interobserver and intraobserver variability. Therefore, this manuscript proposed an objective method to provide assessment results by using facial videos and applying machine learning models. This grading method is based on HBGS, but it is automatically implemented with high objectivity. Images with facial expressions will be extracted from the videos to be analyzed by a machine learning model. Facial landmarks will be acquired from the images by using a 68-points model provided by dlib. Then index and coordinate information of the landmarks will be used to calculate the values of features pre-designed to train the model and predict the result of new patients. Due to the difficulty of collecting facial paralysis samples, the data size is limited. Random Forest (RF) and support vector machine (SVM) were compared as the classifiers. This method was applied on a data set of 33 subjects. The highest overall accuracy rate reached 88.9%, confirming the effectiveness of this method.
单侧周围性面瘫是最常见的面瘫病例。它只影响脸的一侧,会造成面部不对称。临床上,单侧周围性面瘫常由临床医生根据患者面部对称情况,根据评定量表进行分类。一个流行的量表是House-Brackmann评分系统(HBGS)。然而,量表的评估结果往往具有很大的主观性,并且会带来很高的观察者之间和观察者内部的变异性。因此,本文提出了一种利用人脸视频和应用机器学习模型提供评估结果的客观方法。这种分级方法是基于HBGS的,但它是自动实现的,客观性高。带有面部表情的图像将从视频中提取出来,由机器学习模型进行分析。使用dlib提供的68点模型从图像中获取面部地标。然后利用地标的索引和坐标信息计算预先设计的特征值,训练模型并预测新患者的结果。由于面瘫样本的采集难度较大,数据量有限。比较了随机森林(RF)和支持向量机(SVM)作为分类器。该方法应用于33名受试者的数据集。总体准确率最高达88.9%,证实了该方法的有效性。
{"title":"Automatic Assessment of Facial Paralysis Based on Facial Landmarks","authors":"Yuxi Liu, Zhimin Xu, L. Ding, Jie Jia, Xiaomei Wu","doi":"10.1109/PRML52754.2021.9520746","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520746","url":null,"abstract":"Unilateral peripheral facial paralysis is the most common case of facial paralysis. It affects only one side of the face, which will cause facial asymmetry. Clinically, unilateral peripheral facial paralysis is often classified by clinicians according to evaluation scales, based on patients’ condition of facial symmetry. A prevalent scale is House-Brackmann grading system (HBGS). However, assessment results from scales are often with great subjectivity, and will bring high interobserver and intraobserver variability. Therefore, this manuscript proposed an objective method to provide assessment results by using facial videos and applying machine learning models. This grading method is based on HBGS, but it is automatically implemented with high objectivity. Images with facial expressions will be extracted from the videos to be analyzed by a machine learning model. Facial landmarks will be acquired from the images by using a 68-points model provided by dlib. Then index and coordinate information of the landmarks will be used to calculate the values of features pre-designed to train the model and predict the result of new patients. Due to the difficulty of collecting facial paralysis samples, the data size is limited. Random Forest (RF) and support vector machine (SVM) were compared as the classifiers. This method was applied on a data set of 33 subjects. The highest overall accuracy rate reached 88.9%, confirming the effectiveness of this method.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122489309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1