首页 > 最新文献

2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)最新文献

英文 中文
Detecting Psychological Stress from Speech using Deep Neural Networks and Ensemble Classifiers 基于深度神经网络和集成分类器的语音心理压力检测
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587430
Serban Mihalache, D. Burileanu, C. Burileanu
Speech stress detection remains an important research area, with applicability to fields and tasks such as remote monitoring, virtual assistance software, forensics operations, and even health and safety. This paper proposes a deep learning system, based on multiple Deep Neural Networks (DNNs) joined within an ensemble one-vs-one (OvO) classification strategy, using an extensive set of algorithmically extracted acoustic, prosodic, spectral, and cepstral features. The system was tested on the Speech Under Simulated and Actual Stress (SUSAS) database, for 5 class subsets and groups. Improvements have been obtained over previously reported results, with an unweighted accuracy (UA) between 62.4% and 76.1%, depending on the number of classes and their grouping.
语音压力检测仍然是一个重要的研究领域,适用于远程监控,虚拟援助软件,法医操作,甚至健康和安全等领域和任务。本文提出了一种基于多个深度神经网络(dnn)的深度学习系统,该系统采用集成一对一(OvO)分类策略,使用大量算法提取的声学、韵律、频谱和倒谱特征。系统在模拟和实际压力下的语音(SUSAS)数据库上进行了5类子集和组的测试。与之前报道的结果相比,已经获得了改进,未加权准确率(UA)在62.4%到76.1%之间,具体取决于类别的数量和它们的分组。
{"title":"Detecting Psychological Stress from Speech using Deep Neural Networks and Ensemble Classifiers","authors":"Serban Mihalache, D. Burileanu, C. Burileanu","doi":"10.1109/sped53181.2021.9587430","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587430","url":null,"abstract":"Speech stress detection remains an important research area, with applicability to fields and tasks such as remote monitoring, virtual assistance software, forensics operations, and even health and safety. This paper proposes a deep learning system, based on multiple Deep Neural Networks (DNNs) joined within an ensemble one-vs-one (OvO) classification strategy, using an extensive set of algorithmically extracted acoustic, prosodic, spectral, and cepstral features. The system was tested on the Speech Under Simulated and Actual Stress (SUSAS) database, for 5 class subsets and groups. Improvements have been obtained over previously reported results, with an unweighted accuracy (UA) between 62.4% and 76.1%, depending on the number of classes and their grouping.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124824149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Fully Autonomous Person Re-Identification System 一个完全自主的人员再识别系统
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587446
Roxana Mihaescu, Mihai Chindea, S. Carata, M. Ghenescu, C. Paleologu
The problem of re-identification involves the association of the appearances of a person caught with one or more surveillance cameras. This task is especially challenging in very crowded areas, where possible occlusions of people can drastically reduce visibility. In this paper, we aim to obtain a fully automatic re-identification system containing a stage of detection of persons before the stage of re-identification. Both stages are based on a general-purpose DNN (Deep Neural Network) object detector - the YOLO (You Only Look Once) model. The primary purpose and novelty of the proposed method are to obtain an autonomous re-identification system, starting from a simple detection model. Thus, with minimal computational and hardware resources, the proposed method leads to comparable results with other existing methods, even when running in real-time on multiple security cameras.
重新识别的问题涉及到将一个或多个监控摄像头拍到的人的外表联系起来。这项任务在非常拥挤的地区尤其具有挑战性,在那里可能出现的人群遮挡会大大降低能见度。在本文中,我们的目标是获得一个完全自动化的再识别系统,该系统包含在再识别阶段之前的人员检测阶段。这两个阶段都是基于一个通用的DNN(深度神经网络)对象检测器- YOLO(你只看一次)模型。该方法的主要目的和新颖之处在于,从一个简单的检测模型出发,获得一个自主的再识别系统。因此,即使在多个安全摄像机上实时运行时,所提出的方法在计算和硬件资源最少的情况下也能获得与其他现有方法相当的结果。
{"title":"A Fully Autonomous Person Re-Identification System","authors":"Roxana Mihaescu, Mihai Chindea, S. Carata, M. Ghenescu, C. Paleologu","doi":"10.1109/sped53181.2021.9587446","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587446","url":null,"abstract":"The problem of re-identification involves the association of the appearances of a person caught with one or more surveillance cameras. This task is especially challenging in very crowded areas, where possible occlusions of people can drastically reduce visibility. In this paper, we aim to obtain a fully automatic re-identification system containing a stage of detection of persons before the stage of re-identification. Both stages are based on a general-purpose DNN (Deep Neural Network) object detector - the YOLO (You Only Look Once) model. The primary purpose and novelty of the proposed method are to obtain an autonomous re-identification system, starting from a simple detection model. Thus, with minimal computational and hardware resources, the proposed method leads to comparable results with other existing methods, even when running in real-time on multiple security cameras.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129447461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech under combined Schizophrenia and Salivary Flow Alterations – Preliminary Data and Results 精神分裂症合并唾液流动改变下的语言-初步数据和结果
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587347
H. Teodorescu, S. Andrian, C. Ghiorghe, Ș. Gheltu, Ionuţ Tărăboanţă
We describe a method and the initial phase of building a database for investigating the combined effects of schizophrenia and drug-induced salivary flow alterations on speech and, based on the preliminary results, we propose a quantitative method of assessing these effects. We believe that this is the first attempt to conduct a systematic study with these two causes combined and with narrow focusing on speech changes related to fricatives.
我们描述了一种方法和建立数据库的初始阶段,用于调查精神分裂症和药物诱导的唾液流改变对语言的综合影响,并基于初步结果,我们提出了一种评估这些影响的定量方法。我们认为,这是第一次尝试将这两个原因结合起来进行系统研究,并将注意力集中在与摩擦音相关的语音变化上。
{"title":"Speech under combined Schizophrenia and Salivary Flow Alterations – Preliminary Data and Results","authors":"H. Teodorescu, S. Andrian, C. Ghiorghe, Ș. Gheltu, Ionuţ Tărăboanţă","doi":"10.1109/sped53181.2021.9587347","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587347","url":null,"abstract":"We describe a method and the initial phase of building a database for investigating the combined effects of schizophrenia and drug-induced salivary flow alterations on speech and, based on the preliminary results, we propose a quantitative method of assessing these effects. We believe that this is the first attempt to conduct a systematic study with these two causes combined and with narrow focusing on speech changes related to fricatives.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130894458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Data-Reuse Approach for an Optimized LMS Algorithm 优化LMS算法的数据重用方法
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587371
Alexandru-George Rusu, Laura-Maria Dogariu, S. Ciochină, C. Paleologu
The least-mean-square (LMS) type algorithms are widely spread in signal processing, especially in the system identification context. The classic LMS algorithm has major drawbacks due to the fixed step-size that limits the overall performance. The optimized LMS (LMSO) algorithm followed an optimization criterion and introduced a variable step-size so that it overcomes the drawbacks of the LMS algorithm. Some scenarios where the unknown system changes have highlighted the need for the LMSO algorithm to improve how fast it models the new system. In this paper, we apply the data-reuse approach for the LMSO algorithm aiming to increase the convergence rate. The simulations outline the performance improvement for the data-reuse method in combination with the LMSO algorithm.
最小均方(LMS)算法在信号处理中得到了广泛的应用,特别是在系统辨识领域。由于固定的步长限制了整体性能,经典的LMS算法有很大的缺点。优化后的LMS (LMSO)算法遵循优化准则,引入可变步长,克服了LMS算法的不足。在一些未知系统变化的情况下,LMSO算法需要提高其对新系统建模的速度。本文将数据重用方法应用于LMSO算法,以提高算法的收敛速度。仿真结果表明,数据重用方法与LMSO算法相结合,可以提高系统的性能。
{"title":"A Data-Reuse Approach for an Optimized LMS Algorithm","authors":"Alexandru-George Rusu, Laura-Maria Dogariu, S. Ciochină, C. Paleologu","doi":"10.1109/sped53181.2021.9587371","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587371","url":null,"abstract":"The least-mean-square (LMS) type algorithms are widely spread in signal processing, especially in the system identification context. The classic LMS algorithm has major drawbacks due to the fixed step-size that limits the overall performance. The optimized LMS (LMSO) algorithm followed an optimization criterion and introduced a variable step-size so that it overcomes the drawbacks of the LMS algorithm. Some scenarios where the unknown system changes have highlighted the need for the LMSO algorithm to improve how fast it models the new system. In this paper, we apply the data-reuse approach for the LMSO algorithm aiming to increase the convergence rate. The simulations outline the performance improvement for the data-reuse method in combination with the LMSO algorithm.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117335968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Project Vāc: Can a Text-to-Speech Engine Generate Human Sentiments? 项目Vāc:文本转语音引擎能产生人类情感吗?
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587366
S. Kulkarni, Luis Barbado, Jordan Hosier, Yu Zhou, Siddharth Rajagopalan, V. Gurbani
Sentiment analysis is an important area of natural language processing (NLP) research, and is increasingly being performed by machine learning models. Much of the work in this area is concentrated on extracting sentiment from textual data sources. Clearly however, a textual source does not convey the pitch, prosody, or power of the spoken sentiment, making it attractive to extract sentiments from an audio stream. A fundamental prerequisite for sentiment analysis on audio streams is the availability of reliable acoustic representation of sentiment, appropriately labeled. The lack of an existing, large-scale dataset in this form forces researchers to curate audio datasets from a variety of sources, often by manually labeling the audio corpus. However, this approach is inherently subjective. What appears “positive” to one human listener may appear “neutral” to another. Such challenges yield sub-optimal datasets that are often class imbalanced, and the inevitable biases present in the labeling process can permeate these models in problematic ways. To mitigate these disadvantages, we propose the use of a text-to-speech (TTS) engine to generate labeled synthetic voice samples rendered in one of three sentiments: positive, negative, or neutral. The advantage of using a TTS engine is that it can be abstracted as a function that generates an infinite set of labeled samples, on which a sentiment detection model can be trained. We investigate, in particular, the extent to which such training exhibits acceptable accuracy when the induced model is tested on a separate, independent and identically distributed speech source (i.e., the test dataset is not drawn from the same distribution as the training dataset). Our results indicate that this approach shows promise and the induced model does not suffer from underspecification.
情感分析是自然语言处理(NLP)研究的一个重要领域,并且越来越多地由机器学习模型来执行。该领域的大部分工作都集中在从文本数据源中提取情感。然而,很明显,文本来源并不能传达音调、韵律或口头情感的力量,因此从音频流中提取情感是很有吸引力的。对音频流进行情感分析的一个基本先决条件是情绪的可靠声学表示的可用性,并适当地标记。这种形式的现有大规模数据集的缺乏迫使研究人员从各种来源整理音频数据集,通常是通过手动标记音频语料库。然而,这种方法本质上是主观的。在一个听众看来是“积极”的东西,在另一个听众看来可能是“中立”的。这样的挑战会产生次优的数据集,这些数据集通常是类不平衡的,并且标记过程中不可避免的偏差会以有问题的方式渗透到这些模型中。为了减轻这些缺点,我们建议使用文本到语音(TTS)引擎来生成标记的合成语音样本,以三种情绪之一呈现:积极、消极或中性。使用TTS引擎的优点是,它可以抽象为一个函数,生成无限的标记样本集,在此基础上可以训练情感检测模型。我们特别研究了当诱导模型在一个单独的、独立的和相同分布的语音源上进行测试时,这种训练在多大程度上显示出可接受的准确性(即,测试数据集不是从与训练数据集相同的分布中提取的)。我们的结果表明,这种方法是有希望的,并且诱导模型不会受到规格不足的影响。
{"title":"Project Vāc: Can a Text-to-Speech Engine Generate Human Sentiments?","authors":"S. Kulkarni, Luis Barbado, Jordan Hosier, Yu Zhou, Siddharth Rajagopalan, V. Gurbani","doi":"10.1109/sped53181.2021.9587366","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587366","url":null,"abstract":"Sentiment analysis is an important area of natural language processing (NLP) research, and is increasingly being performed by machine learning models. Much of the work in this area is concentrated on extracting sentiment from textual data sources. Clearly however, a textual source does not convey the pitch, prosody, or power of the spoken sentiment, making it attractive to extract sentiments from an audio stream. A fundamental prerequisite for sentiment analysis on audio streams is the availability of reliable acoustic representation of sentiment, appropriately labeled. The lack of an existing, large-scale dataset in this form forces researchers to curate audio datasets from a variety of sources, often by manually labeling the audio corpus. However, this approach is inherently subjective. What appears “positive” to one human listener may appear “neutral” to another. Such challenges yield sub-optimal datasets that are often class imbalanced, and the inevitable biases present in the labeling process can permeate these models in problematic ways. To mitigate these disadvantages, we propose the use of a text-to-speech (TTS) engine to generate labeled synthetic voice samples rendered in one of three sentiments: positive, negative, or neutral. The advantage of using a TTS engine is that it can be abstracted as a function that generates an infinite set of labeled samples, on which a sentiment detection model can be trained. We investigate, in particular, the extent to which such training exhibits acceptable accuracy when the induced model is tested on a separate, independent and identically distributed speech source (i.e., the test dataset is not drawn from the same distribution as the training dataset). Our results indicate that this approach shows promise and the induced model does not suffer from underspecification.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Automated Intelligibility Assessment for Dysarthric Speakers 语言障碍说话者可理解性自动评估综述
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587400
Andy Huang, Kyle Hall, C. Watson, Seyed Reza Shahamiri
Automated dysarthria intelligibility assessment offers the opportunity to develop reliable, low-cost, and scalable tools, which help to solve current shortcomings of manual and subjective intelligibility assessments. This paper reviews the literature regarding automated intelligibility assessment, identifying the highest performing published models and concluding on promising avenues for further research. Our review shows that most of the existing work were able to achieve very high accuracies. However, we have found that most of these studies validated their models using speech samples of the same speakers used in training, making their results less generalizable. Furthermore, there is a lack of study on how well these models perform on speakers from different datasets or different microphone setups. This lack of generalizability has implications to the real-life application of these models. Future research directions could include the use of more robust methods of validation such as using unseen speakers, as well as incorporating speakers from different datasets. This would provide confidence that the models are generalized and therefore allow them to be used in real-world clinical practice.
自动构音障碍可理解性评估为开发可靠、低成本和可扩展的工具提供了机会,这有助于解决当前手动和主观可理解性评估的缺点。本文回顾了有关自动可理解性评估的文献,确定了表现最好的已发表模型,并总结了进一步研究的有前途的途径。我们的审查表明,大多数现有的工作能够达到非常高的精度。然而,我们发现,这些研究中的大多数都使用了训练中使用的同一说话者的语音样本来验证他们的模型,这使得他们的结果不那么普遍。此外,缺乏关于这些模型在不同数据集或不同麦克风设置的扬声器上表现如何的研究。这种缺乏普遍性影响这些模型的实际应用。未来的研究方向可能包括使用更强大的验证方法,例如使用看不见的说话人,以及合并来自不同数据集的说话人。这将提供信心,模型是普遍的,因此允许他们在现实世界的临床实践中使用。
{"title":"A Review of Automated Intelligibility Assessment for Dysarthric Speakers","authors":"Andy Huang, Kyle Hall, C. Watson, Seyed Reza Shahamiri","doi":"10.1109/sped53181.2021.9587400","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587400","url":null,"abstract":"Automated dysarthria intelligibility assessment offers the opportunity to develop reliable, low-cost, and scalable tools, which help to solve current shortcomings of manual and subjective intelligibility assessments. This paper reviews the literature regarding automated intelligibility assessment, identifying the highest performing published models and concluding on promising avenues for further research. Our review shows that most of the existing work were able to achieve very high accuracies. However, we have found that most of these studies validated their models using speech samples of the same speakers used in training, making their results less generalizable. Furthermore, there is a lack of study on how well these models perform on speakers from different datasets or different microphone setups. This lack of generalizability has implications to the real-life application of these models. Future research directions could include the use of more robust methods of validation such as using unseen speakers, as well as incorporating speakers from different datasets. This would provide confidence that the models are generalized and therefore allow them to be used in real-world clinical practice.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132721819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Assessment of Pronunciation in Language Learning Applications 语音评估在语言学习中的应用
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587353
Camelia-Georgiana Stativă, Adrian Iftene, Camelia-Maria Milut
This paper proposes a smartphone application meant to be used in the process of learning a new language. Our application introduces to its users a series of exercises oriented towards word reproduction, aiming to enhance one’s vocabulary alongside improving the pronunciation, being capable to indicate the flaws in the user’s utterance. The targeted users are Romanian language speakers willing to learn and practice English, with profiles for both children (beginners) and adults (advanced). The core of the application is the pronunciation module. It will be presented with two methods of analysing the accuracy of the pronunciation and the benefits and disadvantages brought by each of them. The users of this application will take advantage of two important factors involved in the process of studying a foreign language: applying it and receiving continuous feedback.
本文提出了一个智能手机应用程序,旨在用于学习一门新语言的过程中。我们的应用向用户介绍了一系列以单词再现为导向的练习,目的是在提高发音的同时增加词汇量,能够指出用户发音中的缺陷。目标用户是愿意学习和练习英语的罗马尼亚语使用者,包括儿童(初学者)和成人(高级)。该应用程序的核心是发音模块。本文将介绍两种分析发音准确性的方法,以及每种方法带来的利弊。本应用程序的用户将利用学习外语过程中涉及的两个重要因素:应用它和接收持续的反馈。
{"title":"Assessment of Pronunciation in Language Learning Applications","authors":"Camelia-Georgiana Stativă, Adrian Iftene, Camelia-Maria Milut","doi":"10.1109/sped53181.2021.9587353","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587353","url":null,"abstract":"This paper proposes a smartphone application meant to be used in the process of learning a new language. Our application introduces to its users a series of exercises oriented towards word reproduction, aiming to enhance one’s vocabulary alongside improving the pronunciation, being capable to indicate the flaws in the user’s utterance. The targeted users are Romanian language speakers willing to learn and practice English, with profiles for both children (beginners) and adults (advanced). The core of the application is the pronunciation module. It will be presented with two methods of analysing the accuracy of the pronunciation and the benefits and disadvantages brought by each of them. The users of this application will take advantage of two important factors involved in the process of studying a foreign language: applying it and receiving continuous feedback.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131035139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speaker Verification Experiments using Identity Vectors, on a Romanian Speakers Corpus 使用身份向量的说话人验证实验,在罗马尼亚语说话人语料库上
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587396
Oana-Mariana Novac, Stefan-Adrian Toma, Emil Bureaca
One of the applications of speaker recognition technologies is in the forensics field. It is reasonable to assume that target speakers are not always cooperating, i.e., there are no available recordings, and even if they are, they are not always in the language for which the speaker was enrolled. In this study we present a set of experiments with an identity vector speaker recognition system, trained and tested with a Romanian language corpus (RoDigits), along with an assessment of its performance when there’s mismatch between training and testing language.
说话人识别技术的应用之一是在法医领域。可以合理地假设目标发言者并不总是合作,即没有可用的录音,即使有,录音也并不总是以发言者登记使用的语言进行。在本研究中,我们提出了一组身份向量说话人识别系统的实验,该系统使用罗马尼亚语言语料库(RoDigits)进行训练和测试,并对其在训练语言和测试语言之间存在不匹配时的性能进行了评估。
{"title":"Speaker Verification Experiments using Identity Vectors, on a Romanian Speakers Corpus","authors":"Oana-Mariana Novac, Stefan-Adrian Toma, Emil Bureaca","doi":"10.1109/sped53181.2021.9587396","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587396","url":null,"abstract":"One of the applications of speaker recognition technologies is in the forensics field. It is reasonable to assume that target speakers are not always cooperating, i.e., there are no available recordings, and even if they are, they are not always in the language for which the speaker was enrolled. In this study we present a set of experiments with an identity vector speaker recognition system, trained and tested with a Romanian language corpus (RoDigits), along with an assessment of its performance when there’s mismatch between training and testing language.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117299290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvements of SpeeD’s Romanian ASR system during ReTeRom project ReTeRom项目中SpeeD公司罗马尼亚ASR系统的改进
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587383
Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu
Automatic speech recognition (ASR) for Romanian language is on an ascending trend of interest for the scientific community. In the last two years several research groups reported valuable results on speech recognition and dialogue tasks for Romanian. In our paper we present the improvements we recently obtained by collecting and using more text and audio data for training the language and acoustic models. We emphasize the automatic methodologies employed to facilitate data collection and annotation. In comparison to our previous work, we report state-of-the-art results for read speech (WER of 1.6%) and significantly better results on spontaneous speech: relative improvement around 40%). In order to facilitate direct comparison with other ASR systems, we release all evaluation datasets, totaling 10 hours of manually annotated speech.
罗马尼亚语的自动语音识别(ASR)正处于科学界关注的上升趋势。在过去两年中,几个研究小组报告了罗马尼亚语语音识别和对话任务的宝贵成果。在我们的论文中,我们介绍了我们最近通过收集和使用更多的文本和音频数据来训练语言和声学模型所获得的改进。我们强调采用自动化方法来方便数据收集和注释。与我们之前的工作相比,我们报告了最先进的阅读语音结果(WER为1.6%),自发语音结果明显更好:相对改善约40%)。为了便于与其他ASR系统进行直接比较,我们发布了所有评估数据集,总计10小时的人工注释语音。
{"title":"Improvements of SpeeD’s Romanian ASR system during ReTeRom project","authors":"Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu","doi":"10.1109/sped53181.2021.9587383","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587383","url":null,"abstract":"Automatic speech recognition (ASR) for Romanian language is on an ascending trend of interest for the scientific community. In the last two years several research groups reported valuable results on speech recognition and dialogue tasks for Romanian. In our paper we present the improvements we recently obtained by collecting and using more text and audio data for training the language and acoustic models. We emphasize the automatic methodologies employed to facilitate data collection and annotation. In comparison to our previous work, we report state-of-the-art results for read speech (WER of 1.6%) and significantly better results on spontaneous speech: relative improvement around 40%). In order to facilitate direct comparison with other ASR systems, we release all evaluation datasets, totaling 10 hours of manually annotated speech.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124816764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Cognitive View on Intonational Meaning 语调意义的认知观
Pub Date : 2021-10-13 DOI: 10.1109/sped53181.2021.9587358
D. Jitca
The paper proposes a cognitive view on intonational contours aiming to describe their partitions in terms of elementary cognitive categories related to a generic information packaging (IPk) mechanism. We formulate the hypothesis that IPk structures pack auditory information items at the cortical level into relations that are marked at the utterance level by prosodic phrases. An IPk model based on this hypothesis is used in the paper for describing two pairs of contours that are presented in [1] as problematic for a categorical phonological description. The paper proposes a cognitive description of the respective contours after their partitioning into hierarchies of nested IPk units. In this view phonological events become marks of functional constituents at the cognitive level and the semantic differences between contours are reflected by their structural cognitive differences.
本文提出了一种语调轮廓的认知观点,旨在用基本认知范畴来描述语调轮廓的划分,这与通用信息包装机制有关。我们提出了一个假设,即IPk结构在皮层水平上将听觉信息项目打包成在发音水平上由韵律短语标记的关系。本文使用基于这一假设的IPk模型来描述[1]中提出的两对轮廓,这些轮廓在分类语音描述中是有问题的。本文提出了一种将轮廓划分为嵌套IPk单元层次的认知描述方法。该观点认为,语音事件在认知层面上成为功能成分的标志,轮廓之间的语义差异反映在它们的结构认知差异上。
{"title":"A Cognitive View on Intonational Meaning","authors":"D. Jitca","doi":"10.1109/sped53181.2021.9587358","DOIUrl":"https://doi.org/10.1109/sped53181.2021.9587358","url":null,"abstract":"The paper proposes a cognitive view on intonational contours aiming to describe their partitions in terms of elementary cognitive categories related to a generic information packaging (IPk) mechanism. We formulate the hypothesis that IPk structures pack auditory information items at the cortical level into relations that are marked at the utterance level by prosodic phrases. An IPk model based on this hypothesis is used in the paper for describing two pairs of contours that are presented in [1] as problematic for a categorical phonological description. The paper proposes a cognitive description of the respective contours after their partitioning into hierarchies of nested IPk units. In this view phonological events become marks of functional constituents at the cognitive level and the semantic differences between contours are reflected by their structural cognitive differences.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125083674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1