2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295005
Pin-Yuan Chen, Chia-Hua Wu, Hung-Shin Lee, Shao-Kang Tsao, M. Ko, Hsin-Min Wang
An obvious problem with automatic speech recognition (ASR) for Taigi is that the amount of training data is far from enough to build a practical ASR system. Collecting speech data with reliable transcripts for training the acoustic model (AM) is feasible but expensive. Moreover, text data used for language model (LM) training is extremely scarce and difficult to collect because Taigi is a spoken language, not a commonly used written language. Interestingly, the subtitles of Taigi drama in Taiwan have long been in Chinese characters for Mandarin. Since a large amount of Taigi drama episodes with Mandarin Chinese subtitles are available on YouTube, we propose a method to augment the training data for AM and LM of Taigi ASR. The idea is to use an initial Taigi ASR system to convert a Mandarin Chinese subtitle into the most likely Taigi word sequence by referring to the speech. Experimental results show that our ASR system can be remarkably improved by such training data augmentation.
{"title":"Using Taigi Dramas with Mandarin Chinese Subtitles to Improve Taigi Speech Recognition","authors":"Pin-Yuan Chen, Chia-Hua Wu, Hung-Shin Lee, Shao-Kang Tsao, M. Ko, Hsin-Min Wang","doi":"10.1109/O-COCOSDA50338.2020.9295005","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295005","url":null,"abstract":"An obvious problem with automatic speech recognition (ASR) for Taigi is that the amount of training data is far from enough to build a practical ASR system. Collecting speech data with reliable transcripts for training the acoustic model (AM) is feasible but expensive. Moreover, text data used for language model (LM) training is extremely scarce and difficult to collect because Taigi is a spoken language, not a commonly used written language. Interestingly, the subtitles of Taigi drama in Taiwan have long been in Chinese characters for Mandarin. Since a large amount of Taigi drama episodes with Mandarin Chinese subtitles are available on YouTube, we propose a method to augment the training data for AM and LM of Taigi ASR. The idea is to use an initial Taigi ASR system to convert a Mandarin Chinese subtitle into the most likely Taigi word sequence by referring to the speech. Experimental results show that our ASR system can be remarkably improved by such training data augmentation.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122147303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295039
Yuan Jia, Yuzhu Liang, T. Zhu
This paper investigates acoustic features of depression patients in voice quality and formants, from the perspective of experimental phonetics. The analysis on voice quality based on large samples shows that jitter, shimmer and HNR can distinguish the patients with different degrees of depression, while F0, standard deviation of F0 and HNR can distinguish depression patients from non-patients. These features indicate that the voice of patients tends to be hoarse and rough, with a lower pitch falling into a narrower range. The analysis on formants shows that depression patients tend to centralize monophthongs and simplify diphthongs, reflected by a lower opening degree and slower movement of tongue. Moreover, the patients tend to show a lower spectrum energy than healthy people. Finally, our analysis results suggest that these acoustic features can be used as objective markers for recognition of depression.
{"title":"An Analysis of Acoustic Features in Reading Speech from Chinese Patients with Depression","authors":"Yuan Jia, Yuzhu Liang, T. Zhu","doi":"10.1109/O-COCOSDA50338.2020.9295039","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295039","url":null,"abstract":"This paper investigates acoustic features of depression patients in voice quality and formants, from the perspective of experimental phonetics. The analysis on voice quality based on large samples shows that jitter, shimmer and HNR can distinguish the patients with different degrees of depression, while F0, standard deviation of F0 and HNR can distinguish depression patients from non-patients. These features indicate that the voice of patients tends to be hoarse and rough, with a lower pitch falling into a narrower range. The analysis on formants shows that depression patients tend to centralize monophthongs and simplify diphthongs, reflected by a lower opening degree and slower movement of tongue. Moreover, the patients tend to show a lower spectrum energy than healthy people. Finally, our analysis results suggest that these acoustic features can be used as objective markers for recognition of depression.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125769755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295021
Naing Naing Khin, K. Soe
Educational chatbots have great potential to help students, teachers and education staff. They provide useful information in educational sectors for inquirers. Neural chatbots are more scalable and popular than earlier ruled-based chatbots. Recurrent Neural Network based Sequence to Sequence (Seq2Seq) model can be used to create chatbots. Seq2Seq is adapted for good conversational model for sequences especially in question answering systems. In this paper, we explore the ways of communication through neural network chatbot by using the Sequence to Sequence model with Attention Mechanism based on RNN encoder decoder model. This chatbot is intended to be used in university education sector for frequently asked questions about the university and its related information. It is the first Myanmar Language University Chatbot using neural network model and gets 0.41 BLEU score.
{"title":"Question Answering based University Chatbot using Sequence to Sequence Model","authors":"Naing Naing Khin, K. Soe","doi":"10.1109/O-COCOSDA50338.2020.9295021","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295021","url":null,"abstract":"Educational chatbots have great potential to help students, teachers and education staff. They provide useful information in educational sectors for inquirers. Neural chatbots are more scalable and popular than earlier ruled-based chatbots. Recurrent Neural Network based Sequence to Sequence (Seq2Seq) model can be used to create chatbots. Seq2Seq is adapted for good conversational model for sequences especially in question answering systems. In this paper, we explore the ways of communication through neural network chatbot by using the Sequence to Sequence model with Attention Mechanism based on RNN encoder decoder model. This chatbot is intended to be used in university education sector for frequently asked questions about the university and its related information. It is the first Myanmar Language University Chatbot using neural network model and gets 0.41 BLEU score.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129443764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295000
Hideharu Nakajima, Y. Aono
This paper describes a newly developed Japanese speech database in order to find the features of speech and speaking styles that elderly adults actually think is easy to understand for establishing speech synthesis for elderly adults. This speech database is characterized by two features: i) its sentences are largely taken from newsletters beyond just the contents that elderly adults tend to know; ii) the sentences are spoken by exemplary speakers selected through an audition process from the perspective that elderly adults actually think is easy to understand. This paper describes the design of our database and the basic characteristics measured by applying conventional theories. Finally it indicates the extension directions of the conventional theories to establish an easy-to-understand speech synthesis method for elderly adults.
{"title":"Collection and Analyses of Exemplary Speech Data to Establish Easy-to-Understand Speech Synthesis for Japanese Elderly Adults","authors":"Hideharu Nakajima, Y. Aono","doi":"10.1109/O-COCOSDA50338.2020.9295000","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295000","url":null,"abstract":"This paper describes a newly developed Japanese speech database in order to find the features of speech and speaking styles that elderly adults actually think is easy to understand for establishing speech synthesis for elderly adults. This speech database is characterized by two features: i) its sentences are largely taken from newsletters beyond just the contents that elderly adults tend to know; ii) the sentences are spoken by exemplary speakers selected through an audition process from the perspective that elderly adults actually think is easy to understand. This paper describes the design of our database and the basic characteristics measured by applying conventional theories. Finally it indicates the extension directions of the conventional theories to establish an easy-to-understand speech synthesis method for elderly adults.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134423255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295006
Hay Mar Soe Naing, Risanuri Hidayat, Rudy Hartanto, Y. Miyanaga
The sounds in a real environment not often take place in isolation because sounds are building complex and usually happen concurrently. Auditory masking relates to the perceptual interaction between sound components. This paper proposes modeling the effect of simultaneous masking into the Mel frequency cepstral coefficient (MFCC) and effectively improve the performance of the resulting system. Moreover, the Gammatone frequency integration is presented to warp the energy spectrum which can provide gradually decaying the weights and compensate for the loss of spectral correlation. Experiments are carried out on the Aurora-2 database, and frame-level cross entropy-based deep neural network (DNN-HMM) training is used to build an acoustic model. While given models trained on multi-condition speech data, the accuracy of our proposed feature extraction method achieves up to 98.14% in case of 10dB, 94.40% in 5dB, 81.67% in 0dB and 51.5% in −5dB, respectively.
{"title":"A Front-End Technique for Automatic Noisy Speech Recognition","authors":"Hay Mar Soe Naing, Risanuri Hidayat, Rudy Hartanto, Y. Miyanaga","doi":"10.1109/O-COCOSDA50338.2020.9295006","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295006","url":null,"abstract":"The sounds in a real environment not often take place in isolation because sounds are building complex and usually happen concurrently. Auditory masking relates to the perceptual interaction between sound components. This paper proposes modeling the effect of simultaneous masking into the Mel frequency cepstral coefficient (MFCC) and effectively improve the performance of the resulting system. Moreover, the Gammatone frequency integration is presented to warp the energy spectrum which can provide gradually decaying the weights and compensate for the loss of spectral correlation. Experiments are carried out on the Aurora-2 database, and frame-level cross entropy-based deep neural network (DNN-HMM) training is used to build an acoustic model. While given models trained on multi-condition speech data, the accuracy of our proposed feature extraction method achieves up to 98.14% in case of 10dB, 94.40% in 5dB, 81.67% in 0dB and 51.5% in −5dB, respectively.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114871627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295035
Dac-Thang Hoang, Van-Thuy Mai, Tung-Lam Phi
Phone segmentation is a process of detecting the boundaries between phones in a spoken utterance. In this paper, the phone boundaries are detected without knowing contents of speech. Contrast, a concept in image processing, is investigated for phone segmentation. The speech signal is first transformed into frequency domain. Then band energy is extracted and considered as luminance in an image. A contrast function of a frequency band is defined on band energy. The peaks on the curve of contrast function present phone boundaries. The boundaries detected by eight bands are combined using probability mass function. Experiment is conducted on TIMIT corpus and results are promising. This method is also conducted on Vietnamese corpus yielding good results.
{"title":"Blind Phone Segmentation Using Contrast Function","authors":"Dac-Thang Hoang, Van-Thuy Mai, Tung-Lam Phi","doi":"10.1109/O-COCOSDA50338.2020.9295035","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295035","url":null,"abstract":"Phone segmentation is a process of detecting the boundaries between phones in a spoken utterance. In this paper, the phone boundaries are detected without knowing contents of speech. Contrast, a concept in image processing, is investigated for phone segmentation. The speech signal is first transformed into frequency domain. Then band energy is extracted and considered as luminance in an image. A contrast function of a frequency band is defined on band energy. The peaks on the curve of contrast function present phone boundaries. The boundaries detected by eight bands are combined using probability mass function. Experiment is conducted on TIMIT corpus and results are promising. This method is also conducted on Vietnamese corpus yielding good results.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129799653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295026
Vataya Chunwijitra, Sumonmas Thatphithakkul, P. Chootrakool, S. Kasuriya
Due to the effect of globalization, mixing languages between Thai and English has been commonly used in typical conversations in Thailand, even if talking with Thai natives. Consequently, Thai automatic speech recognition that is deployed in multilingual communities are able to handle Thai-English code-switching. One of the main challenges in building a system is selecting phone set for Thai-English pairs which mother tongue-like accent interferes with the English pronunciation. This paper shows evidence that an acoustic model with a Thai phoneme set improves the recognition performance for Thai-English code-mixing speech. The baseline system for comparison built with merge phoneme for Thai and English where the phones were simply combined. The experimental results shown that the word error rate of monolingual Thai phones is reduced by 4.5%.
{"title":"Acoustic modeling for Thai- English code-switching speech","authors":"Vataya Chunwijitra, Sumonmas Thatphithakkul, P. Chootrakool, S. Kasuriya","doi":"10.1109/O-COCOSDA50338.2020.9295026","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295026","url":null,"abstract":"Due to the effect of globalization, mixing languages between Thai and English has been commonly used in typical conversations in Thailand, even if talking with Thai natives. Consequently, Thai automatic speech recognition that is deployed in multilingual communities are able to handle Thai-English code-switching. One of the main challenges in building a system is selecting phone set for Thai-English pairs which mother tongue-like accent interferes with the English pronunciation. This paper shows evidence that an acoustic model with a Thai phoneme set improves the recognition performance for Thai-English code-mixing speech. The baseline system for comparison built with merge phoneme for Thai and English where the phones were simply combined. The experimental results shown that the word error rate of monolingual Thai phones is reduced by 4.5%.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121865076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295038
Ai-jun Li, Chunru Qu, Na Zhi
The study aims at investigating Jinan EFL learners' intonation patterns of Wh-questions from the phonological perspective. When the nuclear accent locates at the beginning of a sentence, both American speakers and Jinan EFL learners adopt the falling intonation pattern. When the nuclear accent is in the middle of or at the end of a sentence, Americans apply a high tone (H*), while Jinan learners adopt a high tone (H*) or a low tone (L*). The final boundary tone of wh-questions produced by Americans and Jinan learners both ended with a L%. Jinan learners tend to put accent on the wh-word, no matter where the nuclear accent is, and use a H* or a L+H*. Patterns they use in other pitch accents are also in a variety.
{"title":"Intonation Patterns of Wh-questions by EFL Learners from Jinan Dialectal Region","authors":"Ai-jun Li, Chunru Qu, Na Zhi","doi":"10.1109/O-COCOSDA50338.2020.9295038","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295038","url":null,"abstract":"The study aims at investigating Jinan EFL learners' intonation patterns of Wh-questions from the phonological perspective. When the nuclear accent locates at the beginning of a sentence, both American speakers and Jinan EFL learners adopt the falling intonation pattern. When the nuclear accent is in the middle of or at the end of a sentence, Americans apply a high tone (H*), while Jinan learners adopt a high tone (H*) or a low tone (L*). The final boundary tone of wh-questions produced by Americans and Jinan learners both ended with a L%. Jinan learners tend to put accent on the wh-word, no matter where the nuclear accent is, and use a H* or a L+H*. Patterns they use in other pitch accents are also in a variety.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124257246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/o-cocosda50338.2020.9295030
This article consists only of a collection of slides from the author's conference presentation.
本文仅由作者在会议上发表的一些幻灯片组成。
{"title":"Country report (Korea)","authors":"","doi":"10.1109/o-cocosda50338.2020.9295030","DOIUrl":"https://doi.org/10.1109/o-cocosda50338.2020.9295030","url":null,"abstract":"This article consists only of a collection of slides from the author's conference presentation.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133654264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-05DOI: 10.1109/O-COCOSDA50338.2020.9295032
Bagus Tris Atmaja, M. Akagi
In dimensional emotion recognition, a model called valence, arousal, and dominance is widely used. The current research in dimensional speech emotion recognition has shown a problem that the performance of valence prediction is lower than arousal and dominance. This paper presents an approach to tackle this problem: improving the low score of valence prediction by utilizing linguistic information. Our approach fuses acoustic features with linguistic features, which is a conversion from words to vectors. The results doubled the performance of valence prediction on both single-task learning single-output (predicting valence only) and multitask learning multi-output (predicting valence, arousal, and dominance). Using a proper combination of acoustic and linguistic features not only improved valence prediction, but also improved arousal and dominance predictions in multitask learning.
{"title":"Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information","authors":"Bagus Tris Atmaja, M. Akagi","doi":"10.1109/O-COCOSDA50338.2020.9295032","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295032","url":null,"abstract":"In dimensional emotion recognition, a model called valence, arousal, and dominance is widely used. The current research in dimensional speech emotion recognition has shown a problem that the performance of valence prediction is lower than arousal and dominance. This paper presents an approach to tackle this problem: improving the low score of valence prediction by utilizing linguistic information. Our approach fuses acoustic features with linguistic features, which is a conversion from words to vectors. The results doubled the performance of valence prediction on both single-task learning single-output (predicting valence only) and multitask learning multi-output (predicting valence, arousal, and dominance). Using a proper combination of acoustic and linguistic features not only improved valence prediction, but also improved arousal and dominance predictions in multitask learning.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115302980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)