2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)最新文献
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384457
Jie Liang, Minyi Yu, Wenjun Chen
In this study we investigated the production of English voiced palato-alveolar fricative /Ʒ/ by Chinese Wu speakers through statistically analyzing two spectral parameters of the native and accented sounds. Results showed that the voiced accented-/Ʒ/ was significantly influenced by voiceless /ɕ/ rather than by voiced /Ʒ/ in Wu dialect; the female Wu speakers tended to over-palatalize the accented sound, suggesting that females might be more influenced by social stereotype than the males in second language acquisition; students with higher level of English were more susceptible to Wu dialect compared to those with lower level of English, indicating that better phonological awareness does not necessarily lead to higher accuracy in phonetic production.
{"title":"Spectral analysis of English voiced palato-alveolar fricative /Ʒ/ produced by Chinese WU Speakers","authors":"Jie Liang, Minyi Yu, Wenjun Chen","doi":"10.1109/ICSDA.2017.8384457","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384457","url":null,"abstract":"In this study we investigated the production of English voiced palato-alveolar fricative /Ʒ/ by Chinese Wu speakers through statistically analyzing two spectral parameters of the native and accented sounds. Results showed that the voiced accented-/Ʒ/ was significantly influenced by voiceless /ɕ/ rather than by voiced /Ʒ/ in Wu dialect; the female Wu speakers tended to over-palatalize the accented sound, suggesting that females might be more influenced by social stereotype than the males in second language acquisition; students with higher level of English were more susceptible to Wu dialect compared to those with lower level of English, indicating that better phonological awareness does not necessarily lead to higher accuracy in phonetic production.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115056790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384444
Hoon Chung, Y. Lee, Sung Joo Lee, J. Park
In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.
{"title":"Spoken English fluency scoring using convolutional neural networks","authors":"Hoon Chung, Y. Lee, Sung Joo Lee, J. Park","doi":"10.1109/ICSDA.2017.8384444","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384444","url":null,"abstract":"In this paper, we propose a spoken English fluency scoring using Convolutional Neural Network (CNN) to learn feature extraction and scoring model jointly from raw time-domain signal input. In general, automatic spoken English fluency scoring is composed feature extraction and a scoring model. Feature extraction is used to compute the feature vectors that are assumed to represent spoken English fluency, and the scoring model predicts the fluency score of an input feature vector. Although the conventional approach works well, there are issues regarding feature extraction and model parameter optimization. First, because the fluency features are computed based on human knowledge, some crucial representations that are included in a raw data corpus can be missed. Second, each parameter of the model is optimized separately, which can lead to suboptimal performance. To address these issues, we propose a CNN-based approach to extract fluency features directly from a raw data corpus without hand-crafted engineering and optimizes all model parameters jointly. The effectiveness of the proposed approach is evaluated using Korean-Spoken English Corpus.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133970449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384461
Xiangting Yin, Ying Chen
An experiment was conducted to investigate pitch heights and contours of tones in Weifang Dialect. Eight middle- aged native speakers of Weifang Dialect participated in the experimental fieldwork. Tone types and values described in Documentation of Weifang Dialect [1] were taken as a reference. Acoustic data, including duration and fundamental frequency (F0), were extracted by using a Praat script— ProsodyPro [2] to track the pitch heights and contours of each tone-syllable sequence in the stimuli. F0 values were normalized to Log-Z (LZ) scores for speakers' individual differences [3, 4]. Tone values extracted from the LZ scores were determined according to the five-scale annotation of tones [5]. Tones in Weifang Dialect were found 324, 52, 44, and 41 in the experiment in contrast to 213, 42, 55 and 21 in Documentation of Weifang Dialect. Syllable durations of the different tones were also examined and compared.
{"title":"Pitch heights and contours of tones in weifang dialect: An experimental investigation","authors":"Xiangting Yin, Ying Chen","doi":"10.1109/ICSDA.2017.8384461","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384461","url":null,"abstract":"An experiment was conducted to investigate pitch heights and contours of tones in Weifang Dialect. Eight middle- aged native speakers of Weifang Dialect participated in the experimental fieldwork. Tone types and values described in Documentation of Weifang Dialect [1] were taken as a reference. Acoustic data, including duration and fundamental frequency (F0), were extracted by using a Praat script— ProsodyPro [2] to track the pitch heights and contours of each tone-syllable sequence in the stimuli. F0 values were normalized to Log-Z (LZ) scores for speakers' individual differences [3, 4]. Tone values extracted from the LZ scores were determined according to the five-scale annotation of tones [5]. Tones in Weifang Dialect were found 324, 52, 44, and 41 in the experiment in contrast to 213, 42, 55 and 21 in Documentation of Weifang Dialect. Syllable durations of the different tones were also examined and compared.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116718988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384419
Youngjoo Suh, Younggwan Kim, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Yeon-Ji Choi, Hoirin Kim, Dae-Lim Choi, Yong-Ju Lee
In this paper, we presents the method and procedure for collecting the Korean distant multi-channel speech and noise databases, which were designed for developing the highly accurate distant speech recognition system for indoor conversational robot applications. The speech database was collected at four different distant positions in an in-door room, which was furnished to simulate a living room acoustically, by the playback-and-recording method that uses an artificial mouth for playing the clean source speech data and three kinds of multi-channel microphone arrays for recording the distant speech data. The speech database further consists of a read speech dataset and two conversational speech datasets. Additionally, the noise database consists of 12 types of in-door noise, which were collected at a single distant position with the same approach. These speech and noise databases can be used for creating simulated noisy speech data reflecting various in-door acoustic conditions corrupted by room reverberation and additive noise.
{"title":"Development of distant multi-channel speech and noise databases for speech recognition by in-door conversational robots","authors":"Youngjoo Suh, Younggwan Kim, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Yeon-Ji Choi, Hoirin Kim, Dae-Lim Choi, Yong-Ju Lee","doi":"10.1109/ICSDA.2017.8384419","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384419","url":null,"abstract":"In this paper, we presents the method and procedure for collecting the Korean distant multi-channel speech and noise databases, which were designed for developing the highly accurate distant speech recognition system for indoor conversational robot applications. The speech database was collected at four different distant positions in an in-door room, which was furnished to simulate a living room acoustically, by the playback-and-recording method that uses an artificial mouth for playing the clean source speech data and three kinds of multi-channel microphone arrays for recording the distant speech data. The speech database further consists of a read speech dataset and two conversational speech datasets. Additionally, the noise database consists of 12 types of in-door noise, which were collected at a single distant position with the same approach. These speech and noise databases can be used for creating simulated noisy speech data reflecting various in-door acoustic conditions corrupted by room reverberation and additive noise.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130135032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384422
Taufik Ridwan, D. Lestari
Most of information retrieval (IR) systems for Qur'an use text as their input query, whether they use the Alphabetic script or the Arabic script to represent the query. Thus, required IR user to know how to write the query. For searching the Qur'an verses, it is possible that IR user knows how to pronounce the query, but does not have enough knowledge about how to write Arabic letters to represent the query when search for a Qur'an verse. In this case, speech can be an alternative as the input to the IR system. In this work, we develop a spoken query IR based on the Hidden Markov Model acoustic models and the n- gram language model for its automatic speech recognition system. Both models are trained by using all verses of the Qur'an. The Inference Network Model and the well-known Vector Space Model are employed for its IR system. For the speech recognition system, average of word error rate are 7.41% for closed speakers, and 18.53% for open speakers. For the IR system, the best query formulation for the Inference Network is achieved by using input queries consisting of phrase of 2 words with the average value of Mean Reciprocal Rank is 0,922475, while for the Vector Space Model is achieved by using input query consisting of one word with the average value of Mean Reciprocal Rank is 0,9308.
{"title":"Spoken query for Qur'anic verse information retrieval","authors":"Taufik Ridwan, D. Lestari","doi":"10.1109/ICSDA.2017.8384422","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384422","url":null,"abstract":"Most of information retrieval (IR) systems for Qur'an use text as their input query, whether they use the Alphabetic script or the Arabic script to represent the query. Thus, required IR user to know how to write the query. For searching the Qur'an verses, it is possible that IR user knows how to pronounce the query, but does not have enough knowledge about how to write Arabic letters to represent the query when search for a Qur'an verse. In this case, speech can be an alternative as the input to the IR system. In this work, we develop a spoken query IR based on the Hidden Markov Model acoustic models and the n- gram language model for its automatic speech recognition system. Both models are trained by using all verses of the Qur'an. The Inference Network Model and the well-known Vector Space Model are employed for its IR system. For the speech recognition system, average of word error rate are 7.41% for closed speakers, and 18.53% for open speakers. For the IR system, the best query formulation for the Inference Network is achieved by using input queries consisting of phrase of 2 words with the average value of Mean Reciprocal Rank is 0,922475, while for the Vector Space Model is achieved by using input query consisting of one word with the average value of Mean Reciprocal Rank is 0,9308.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124566521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384447
Won Ik Cho, Woohyun Kang, Hyeon Seung Lee, N. Kim
This paper proposes a novel evaluation scheme for word vector representation using oxymoron, which is a special kind of contradiction arising from semantic discrepancy between a pair of words. The proper word vector representation is expected to yield a remarkable result using the proposed scheme and evaluation.
{"title":"Detecting oxymoron in a single statement","authors":"Won Ik Cho, Woohyun Kang, Hyeon Seung Lee, N. Kim","doi":"10.1109/ICSDA.2017.8384447","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384447","url":null,"abstract":"This paper proposes a novel evaluation scheme for word vector representation using oxymoron, which is a special kind of contradiction arising from semantic discrepancy between a pair of words. The proper word vector representation is expected to yield a remarkable result using the proposed scheme and evaluation.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384472
Tifani Warnita, D. Lestari
In this paper, we present the first deception corpus in Indonesian to support deception detection based on statistical machine learning approach due to the importance of data in related studies. We collect speech recordings along with their high frame rate video from 30 subjects to develop Indonesian Deception Corpus (IDC). Using financial motivation as its basic scenario, IDC consists of 5542 speech segments with a total duration of approximately 16 hours and 34 minutes. As an imbalanced corpus, the majority class is represented by truth segments which is almost four times higher than the lie segments. We also perform some experiments using only the speech corpus, along with the transcriptions. Using the combination of paralinguistic, prosodic, and lexical features, we obtained the best accuracy of 61.26% and F-measure of 61.30% using Random Forest classifier and RUS as the undersampling technique.
{"title":"Construction and analysis of Indonesian-interviews deception corpus","authors":"Tifani Warnita, D. Lestari","doi":"10.1109/ICSDA.2017.8384472","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384472","url":null,"abstract":"In this paper, we present the first deception corpus in Indonesian to support deception detection based on statistical machine learning approach due to the importance of data in related studies. We collect speech recordings along with their high frame rate video from 30 subjects to develop Indonesian Deception Corpus (IDC). Using financial motivation as its basic scenario, IDC consists of 5542 speech segments with a total duration of approximately 16 hours and 34 minutes. As an imbalanced corpus, the majority class is represented by truth segments which is almost four times higher than the lie segments. We also perform some experiments using only the speech corpus, along with the transcriptions. Using the combination of paralinguistic, prosodic, and lexical features, we obtained the best accuracy of 61.26% and F-measure of 61.30% using Random Forest classifier and RUS as the undersampling technique.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384455
Tanmay Bhowmik, S. Mandal
In this experiment, a phoneme classification model has been developed using a Deep Neural Network based framework. The experiment is conducted in two phases. In the first phase, phoneme classification task has been performed. The deep- structured model provided good overall classification accuracy of 87.8%. All the phonemes are classified with precision and recall values. A confusion matrix of all the Bengali phonemes is derived. Using the confusion matrix, the phonemes are classified into nine groups. These nine groups provided better overall classification accuracy of 98.7%, and a new confusion matrix is derived for this nine groups. A lower confusion rate is observed this time. In the second phase of the experiment, the nine groups are reclassified into 15 groups using the manner of articulation based knowledge and the deep-structured model is retrained. The system provided 98.9% of overall classification accuracy this time. This result is almost equal to the overall accuracy which was observed for nine groups. But as the nine groups are redivided into 15 groups, the phoneme confusion in a single group became less which leads to a better phoneme classification model.
{"title":"Inclusion of manner of articulation to achieve improved phoneme classification accuracy for Bengali continuous speech","authors":"Tanmay Bhowmik, S. Mandal","doi":"10.1109/ICSDA.2017.8384455","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384455","url":null,"abstract":"In this experiment, a phoneme classification model has been developed using a Deep Neural Network based framework. The experiment is conducted in two phases. In the first phase, phoneme classification task has been performed. The deep- structured model provided good overall classification accuracy of 87.8%. All the phonemes are classified with precision and recall values. A confusion matrix of all the Bengali phonemes is derived. Using the confusion matrix, the phonemes are classified into nine groups. These nine groups provided better overall classification accuracy of 98.7%, and a new confusion matrix is derived for this nine groups. A lower confusion rate is observed this time. In the second phase of the experiment, the nine groups are reclassified into 15 groups using the manner of articulation based knowledge and the deep-structured model is retrained. The system provided 98.9% of overall classification accuracy this time. This result is almost equal to the overall accuracy which was observed for nine groups. But as the nine groups are redivided into 15 groups, the phoneme confusion in a single group became less which leads to a better phoneme classification model.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128246900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384448
M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah
This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.
{"title":"Development of text and speech corpus for an Indonesian speech-to-speech translation system","authors":"M. T. Uliniansyah, Hammam Riza, Agung Santosa, Gunarso, Made Gunawan, Elvira Nurfadhilah","doi":"10.1109/ICSDA.2017.8384448","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384448","url":null,"abstract":"This paper describes our natural language resources especially text and speech corpora for developing an Indonesian speech-to-speech translation (S2ST) system. The corpora are used to create models for Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT), and Text-to-Speech (TTS) systems. The corpora collected since 1987 from various sources and projects such as Multilingual Machine Translation System (MMTS), PAN Localization, ASEAN MT, U-STAR, etc. Text corpora are created by either collecting from online resources or translating manually from textual sources. Speech corpora are made from several recording projects. Availability of these corpora enables us to develop Indonesian speech-to- speech translation system.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"15 13","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133170493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-01DOI: 10.1109/ICSDA.2017.8384413
J. Mariani, Gil Francopoulo, P. Paroubek, F. Vernier, Nam Kyun Kim, Moon Ju Jo, H. Kim
We have created the NLP4NLP corpus to study the content of scientific publications in the field of speech and natural language processing. It contains articles published in 34 major conferences and journals in that field over a period of 50 years (1965-2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing approximately 270 million words. Most of these publications are in English, some are in French, German or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze this corpus several tools have been used or developed. Some of them use Natural Language Processing methods that have been published in the corpus, hence its name. Numerous manual corrections were necessary, which demonstrated the importance of establishing standards for uniquely identifying authors, publications or resources. We have conducted various studies: evolution over time of the number of articles and authors, collaborations between authors, citations between papers and authors, evolution of research themes and identification of the authors who introduced them, measure of innovation and detection of epistemological ruptures, use of language resources, reuse of articles and plagiarism in the context of a global or comparative analysis between sources.
{"title":"Rediscovering 50 years of discoveries in speech and language processing: A survey","authors":"J. Mariani, Gil Francopoulo, P. Paroubek, F. Vernier, Nam Kyun Kim, Moon Ju Jo, H. Kim","doi":"10.1109/ICSDA.2017.8384413","DOIUrl":"https://doi.org/10.1109/ICSDA.2017.8384413","url":null,"abstract":"We have created the NLP4NLP corpus to study the content of scientific publications in the field of speech and natural language processing. It contains articles published in 34 major conferences and journals in that field over a period of 50 years (1965-2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing approximately 270 million words. Most of these publications are in English, some are in French, German or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze this corpus several tools have been used or developed. Some of them use Natural Language Processing methods that have been published in the corpus, hence its name. Numerous manual corrections were necessary, which demonstrated the importance of establishing standards for uniquely identifying authors, publications or resources. We have conducted various studies: evolution over time of the number of articles and authors, collaborations between authors, citations between papers and authors, evolution of research themes and identification of the authors who introduced them, measure of innovation and detection of epistemological ruptures, use of language resources, reuse of articles and plagiarism in the context of a global or comparative analysis between sources.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)