This letter presents a robust voice activity detection (VAD) algorithm for detecting voice activity in noisy environments. The presented robust VAD utilizes the entropy measurement defined in band-splitting spectrum domain to exploit the formant frequency representation as a highly efficient, compact representation of the time-varying characteristics of speech. Additionally, Teager energy operator (TEO) can be employed to provide a better representation of formant information resulting in high performance of classification of speech/non-speech priori to entropy-based measurement. The results show that the proposed algorithm has an overall better performance than the standard ITU-T G.729B VAD and Shen's entropy-based VAD.
{"title":"Voice Activity Detection Algorithm with Low Signal-to-Noise Ratios Based on Spectrum Entropy","authors":"Kun-Ching Wang, Y. Tsai","doi":"10.1109/ISUC.2008.55","DOIUrl":"https://doi.org/10.1109/ISUC.2008.55","url":null,"abstract":"This letter presents a robust voice activity detection (VAD) algorithm for detecting voice activity in noisy environments. The presented robust VAD utilizes the entropy measurement defined in band-splitting spectrum domain to exploit the formant frequency representation as a highly efficient, compact representation of the time-varying characteristics of speech. Additionally, Teager energy operator (TEO) can be employed to provide a better representation of formant information resulting in high performance of classification of speech/non-speech priori to entropy-based measurement. The results show that the proposed algorithm has an overall better performance than the standard ITU-T G.729B VAD and Shen's entropy-based VAD.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134012002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this invited talk, depth estimation and object recognition using integral imaging are presented. The computational method reconstructs three-dimensional information at arbitrary depth-levels, eliminating the occluding effect in order to visualize the object of interest. The depth of the object is estimated where the uncertainty of the corresponding intensities is minimized. Various applications including edge detection and statistical pattern recognition can be performed using the 3D information acquired by integral imaging.
{"title":"Depth Estimation and Object Recognition using Integral Imaging (Invited Paper)","authors":"S. Yeom","doi":"10.1109/ISUC.2008.88","DOIUrl":"https://doi.org/10.1109/ISUC.2008.88","url":null,"abstract":"In this invited talk, depth estimation and object recognition using integral imaging are presented. The computational method reconstructs three-dimensional information at arbitrary depth-levels, eliminating the occluding effect in order to visualize the object of interest. The depth of the object is estimated where the uncertainty of the corresponding intensities is minimized. Various applications including edge detection and statistical pattern recognition can be performed using the 3D information acquired by integral imaging.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134113156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Okada, J. Nakata, R. Beuran, Yasuo Tan, Y. Shinoda
Recently mobile robots act in various situations such as disaster areas, office buildings, factories and homes. When these new mobile robots are released, they should be confirmed to work correctly and safety from evaluation. In this paper, we propose large-scale simulation environment of mobile robots on StarBED which is a large-scale networked testbed. By using the simulation environment, we could confirm that four hundreds of mobile robots act in real-time.
{"title":"Large-scale Simulation Method of Mobile Robots","authors":"T. Okada, J. Nakata, R. Beuran, Yasuo Tan, Y. Shinoda","doi":"10.1109/ISUC.2008.42","DOIUrl":"https://doi.org/10.1109/ISUC.2008.42","url":null,"abstract":"Recently mobile robots act in various situations such as disaster areas, office buildings, factories and homes. When these new mobile robots are released, they should be confirmed to work correctly and safety from evaluation. In this paper, we propose large-scale simulation environment of mobile robots on StarBED which is a large-scale networked testbed. By using the simulation environment, we could confirm that four hundreds of mobile robots act in real-time.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123155812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. Computers have become an essential part of modern life, providing services in a multiplicity of ways. Access to these services, however, comes at a price: human attention is bound and directed toward a technical artifact in a human machine interaction setting at the expense of time and attention for other humans. This paper explores a new class of computer services that support human-human interaction and communication implicitly and transparently. Computers in the human interaction loop (CHIL), require consideration of all communication modalities, multimodal integration and more robust performance. We review the technologies and several CHIL services providing human-human support. Among them, we specifically highlight advanced computer services for cross-lingual communication.
{"title":"Speech Processing in Support of Human-Human Communication (Invited Paper)","authors":"A. Waibel","doi":"10.1109/ISUC.2008.78","DOIUrl":"https://doi.org/10.1109/ISUC.2008.78","url":null,"abstract":"Summary form only given. Computers have become an essential part of modern life, providing services in a multiplicity of ways. Access to these services, however, comes at a price: human attention is bound and directed toward a technical artifact in a human machine interaction setting at the expense of time and attention for other humans. This paper explores a new class of computer services that support human-human interaction and communication implicitly and transparently. Computers in the human interaction loop (CHIL), require consideration of all communication modalities, multimodal integration and more robust performance. We review the technologies and several CHIL services providing human-human support. Among them, we specifically highlight advanced computer services for cross-lingual communication.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124276009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, H. Kashioka, Satoshi Nakamura
We have proposed an efficient approach to manage a dialog system using a weighted finite-state transducer (WFST) in which users¿ concept and system¿s action tags are input and output of the transducer, respectively. A WFST for dialog management was automatically created using a corpus annotated with inter-change format (IF) consisting of dialog acts and argument which is an interlingua for machine translation. A word-to-concept WFST for spoken language understanding (SLU) was created using the same corpus. The scenario and SLU WFSTs acquired from the corpus were composed together and then optimized. We have confirmed the WFST automatically trained using the annotated corpus can manage dialog reasonably.
{"title":"A Statistical Approach to Expandable Spoken Dialog Systems using WFSTs","authors":"Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, H. Kashioka, Satoshi Nakamura","doi":"10.1109/ISUC.2008.61","DOIUrl":"https://doi.org/10.1109/ISUC.2008.61","url":null,"abstract":"We have proposed an efficient approach to manage a dialog system using a weighted finite-state transducer (WFST) in which users¿ concept and system¿s action tags are input and output of the transducer, respectively. A WFST for dialog management was automatically created using a corpus annotated with inter-change format (IF) consisting of dialog acts and argument which is an interlingua for machine translation. A word-to-concept WFST for spoken language understanding (SLU) was created using the same corpus. The scenario and SLU WFSTs acquired from the corpus were composed together and then optimized. We have confirmed the WFST automatically trained using the annotated corpus can manage dialog reasonably.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128910220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we report the results of a support group of cancer patients who use a 3-dimensional chat system. We examined the sequences of their messages and intervals with chat logs to evaluate the community development by conversational data. The users tended to send serious messages during long intervals, and the roles they played in the conversations changed as a sense of community emerged.
{"title":"Analysis of Community Development using Chat Logs: A Virtual Support Group of Cancer Patients","authors":"Kanayo Ogura, T. Kusumi, A. Miura","doi":"10.1109/ISUC.2008.32","DOIUrl":"https://doi.org/10.1109/ISUC.2008.32","url":null,"abstract":"In this paper, we report the results of a support group of cancer patients who use a 3-dimensional chat system. We examined the sequences of their messages and intervals with chat logs to evaluate the community development by conversational data. The users tended to send serious messages during long intervals, and the roles they played in the conversations changed as a sense of community emerged.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125633237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giving notifications in a timely manner is an essential service that smart home should provide. Communication of resident¿s data to their doctor, health care provider or family member via email, phone call, or text message is indispensable. In order to reduce the cost, improve extendibility and reduce the developer¿s burden and learning curve, we propose a service-oriented notification system. This system has different simple services which provide essential communication needs, and by composing these services with applications more sophisticated needs can also be satisfied. A detailed description and the architecture are elaborated below.
{"title":"Composition of Services for Notification in Smart Homes","authors":"J. M. Álamo, Tanmoy Sarkar, Johnny S. Wong","doi":"10.1109/ISUC.2008.65","DOIUrl":"https://doi.org/10.1109/ISUC.2008.65","url":null,"abstract":"Giving notifications in a timely manner is an essential service that smart home should provide. Communication of resident¿s data to their doctor, health care provider or family member via email, phone call, or text message is indispensable. In order to reduce the cost, improve extendibility and reduce the developer¿s burden and learning curve, we propose a service-oriented notification system. This system has different simple services which provide essential communication needs, and by composing these services with applications more sophisticated needs can also be satisfied. A detailed description and the architecture are elaborated below.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121283145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes that a graph-based co-clustering approach is suitable for extraction of verb synonyms from large scale texts. The proposed bipartite graph algorithm can produce clusters of verb synonyms as well as noun synonyms taking into account word co-occurrence between verb and its argument. Experimental results show that the co-clustering approach achieve higher accuracy than those by a vector-based single clustering approach that are usually used for construction of thesaurus.
{"title":"Extraction of Verb Synonyms using Co-clustering Approach","authors":"Koichi Takeuchi","doi":"10.1109/ISUC.2008.66","DOIUrl":"https://doi.org/10.1109/ISUC.2008.66","url":null,"abstract":"This paper describes that a graph-based co-clustering approach is suitable for extraction of verb synonyms from large scale texts. The proposed bipartite graph algorithm can produce clusters of verb synonyms as well as noun synonyms taking into account word co-occurrence between verb and its argument. Experimental results show that the co-clustering approach achieve higher accuracy than those by a vector-based single clustering approach that are usually used for construction of thesaurus.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), and (2) for reverberant noise, 51.28% RWER rate (10.17% for AFE).
{"title":"Normalization on Temporal Modulation Transfer Function for Robust Speech Recognition","authors":"Xugang Lu, Shigeki Matsuda, Tohru Shimizu, Satoshi Nakamura","doi":"10.1109/ISUC.2008.74","DOIUrl":"https://doi.org/10.1109/ISUC.2008.74","url":null,"abstract":"In this paper, we proposed a robust speech feature extraction algorithm for automatic speech recognition which reduced the noise effect in the temporal modulation domain. The proposed algorithm has two steps to deal with the time series of cepstral coefficients. The first step adopted a modulation contrast normalization to normalize the temporal modulation contrast of both clean and noisy speech to be in the same range. The second step adopted an edge-preserved smoothing to attenuate the low modulation components while preserving the high modulation components (edges). We tested our algorithms on speech recognition experiments in both additive noise condition (AURORA-2J data corpus) and reverberant noise condition (convolution of clean speech utterances from AURORA-2J with a smart room impulse response signal). For comparison, the ETSI advanced front-end algorithm (AFE) is used. Our results showed that the algorithm got: (1) for additive noise, 57.26% relative word error reduction (RWER) rate for clean conditional training (59.37% for AFE), and 33.52% RWER rate for multi-conditional training (35.77% for AFE), and (2) for reverberant noise, 51.28% RWER rate (10.17% for AFE).","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133993014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Cardey, P. Greenfield, Raksi Anantalapochai, Mohand Beddar, D. DeVitre, Gan Jin
In the context of crises in which emergency services or the general population are of different languages, effective interoperability requires not only that translations of messages and alerts be done rapidly but also, being safety critical, that there be no errors. We have developed a methodology based on linguistic norms and a supporting mathematical model for the construction of a single source controlled language to be machine translated to specific target controlled languages. In this paper we discuss in particular the architecture of our machine translation system which is based on the `canonical¿ case where there are no language divergences (identical source and target languages), and the `variant¿ cases encompassing the divergences between each target controlled language and our source controlled language. We explain the way that we classify and organize the divergences in a declarative manner so as to be incorporated in the machine translation process.
{"title":"Modelling of Multiple Target Machine Translation of Controlled Languages Based on Language Norms and Divergences","authors":"S. Cardey, P. Greenfield, Raksi Anantalapochai, Mohand Beddar, D. DeVitre, Gan Jin","doi":"10.1109/ISUC.2008.13","DOIUrl":"https://doi.org/10.1109/ISUC.2008.13","url":null,"abstract":"In the context of crises in which emergency services or the general population are of different languages, effective interoperability requires not only that translations of messages and alerts be done rapidly but also, being safety critical, that there be no errors. We have developed a methodology based on linguistic norms and a supporting mathematical model for the construction of a single source controlled language to be machine translated to specific target controlled languages. In this paper we discuss in particular the architecture of our machine translation system which is based on the `canonical¿ case where there are no language divergences (identical source and target languages), and the `variant¿ cases encompassing the divergences between each target controlled language and our source controlled language. We explain the way that we classify and organize the divergences in a declarative manner so as to be incorporated in the machine translation process.","PeriodicalId":339811,"journal":{"name":"2008 Second International Symposium on Universal Communication","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128282566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}