Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418823
R. Talafová, G. Rozinaj, J. Cepko
This project is about speech synthesis and creating a speech synthesiser for a mobile cell phone. The first part of this project is about speech synthesis. From the all type of synthesis only diphones synthesis is discussed further, because its features for a mobile cell phone are superior, compared to the other types. This work further analyses implementation of speech synthesiser -this means loading of database, synthesis, creating of annotation file and creating the output sound signal. Final syntheses speech utterance is played together with face animation of talking human face. In second part is described design and implementation of face animation for mobile phone. The last part is conclusion and possibility of improvement of synthesis.
{"title":"Speech synthesis for mobile phone","authors":"R. Talafová, G. Rozinaj, J. Cepko","doi":"10.1109/ELMAR.2007.4418823","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418823","url":null,"abstract":"This project is about speech synthesis and creating a speech synthesiser for a mobile cell phone. The first part of this project is about speech synthesis. From the all type of synthesis only diphones synthesis is discussed further, because its features for a mobile cell phone are superior, compared to the other types. This work further analyses implementation of speech synthesiser -this means loading of database, synthesis, creating of annotation file and creating the output sound signal. Final syntheses speech utterance is played together with face animation of talking human face. In second part is described design and implementation of face animation for mobile phone. The last part is conclusion and possibility of improvement of synthesis.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126939839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418798
G. Gvozden, M. Gosta, S. Grgić
Due to its exceptional efficiency and performance a number of today's global organizations and alliances recognized and embraced new and constantly developed H.264/A VC compression method designed for a broad range of video applications. This article describes advantages of H.264/AVC in mobile communication systems with limited bandwidth. Due to very efficient compression method H.264/AVC enables and provides transport of high quality video on low data rates. In order to demonstrate these abilities we made a comparison of H.264/AVC coding technique with MPEG-4 ASP (advanced simple profile) coding technique currently used in mobile systems. Quality measurement and assessment of encoded video test sequences was performed with peak signal to noise ratio (PSNR), video quality metric (VQM) and structural similarity (SSIM) objective quality measurement methods. Results showed and confirmed great efficiency and performance possibilities which will make H.264/A VC ubiquitous coding technique of multimedia world in time to come.
由于其卓越的效率和性能,许多当今的全球组织和联盟认可并接受了新的和不断开发的H.264/ a VC压缩方法,该方法专为广泛的视频应用而设计。本文介绍了H.264/AVC在带宽有限的移动通信系统中的优势。由于非常有效的压缩方法,H.264/AVC能够以低数据速率传输高质量的视频。为了证明这些能力,我们将H.264/AVC编码技术与目前在移动系统中使用的MPEG-4 ASP(高级简单配置文件)编码技术进行了比较。采用峰值信噪比(PSNR)、视频质量度量(VQM)和结构相似度(SSIM)客观质量测量方法对编码视频测试序列进行质量测量和评价。结果表明并证实了H.264/A VC编码技术的巨大效率和性能可能性,将使其成为多媒体世界的通用编码技术。
{"title":"Comparison of H.264/AVC and MPEG-4 ASP coding techniques designed for mobile applications using objective quality assessment methods","authors":"G. Gvozden, M. Gosta, S. Grgić","doi":"10.1109/ELMAR.2007.4418798","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418798","url":null,"abstract":"Due to its exceptional efficiency and performance a number of today's global organizations and alliances recognized and embraced new and constantly developed H.264/A VC compression method designed for a broad range of video applications. This article describes advantages of H.264/AVC in mobile communication systems with limited bandwidth. Due to very efficient compression method H.264/AVC enables and provides transport of high quality video on low data rates. In order to demonstrate these abilities we made a comparison of H.264/AVC coding technique with MPEG-4 ASP (advanced simple profile) coding technique currently used in mobile systems. Quality measurement and assessment of encoded video test sequences was performed with peak signal to noise ratio (PSNR), video quality metric (VQM) and structural similarity (SSIM) objective quality measurement methods. Results showed and confirmed great efficiency and performance possibilities which will make H.264/A VC ubiquitous coding technique of multimedia world in time to come.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124411019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418821
Z. Yermeche, N. Grbic
This paper presents a new microphone array method to enhance speech signals in a noisy reverberant environment. A time-delay estimation method is used for the speech source localization. The robustness of the localization method in high noise levels is provided by a subband Kurtosis-weighted structure. The estimated inter-sensor time-delays are directly used in an adaptive soft-constrained subband beamformer. Evaluation in a simulated environment with real speech sequences shows promising results.
{"title":"A delay-based constrained beamformer for blind speech enhancement and dereverberation","authors":"Z. Yermeche, N. Grbic","doi":"10.1109/ELMAR.2007.4418821","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418821","url":null,"abstract":"This paper presents a new microphone array method to enhance speech signals in a noisy reverberant environment. A time-delay estimation method is used for the speech source localization. The robustness of the localization method in high noise levels is provided by a subband Kurtosis-weighted structure. The estimated inter-sensor time-delays are directly used in an adaptive soft-constrained subband beamformer. Evaluation in a simulated environment with real speech sequences shows promising results.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130569219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418816
B. Sallberg, N. Grbic, I. Claesson
This paper focuses on realtime speech extraction using blind adaptive beamforming. The speech extraction is carried out using an approximation of the kurtosis measure in a subband domain. The introduced kurtosis approximation is an improvement of a recently proposed approximation technique where a locally quadratic criterion was solved at each iteration. The improvement introduced in this paper regards an approach to normalize this same criterion using a pre-processing automatic gain control unit, and thereby making the algorithm invariant to input signal scales. The proposed method outperforms the recent technique in terms of signal to interference ratio improvement. In addition, the increased memory consumption and processing load due to the proposed improvement is comparably low and this is often desirable in a realtime digital signal processor (DSP) implementation. Further, a real-time implementation of the method is conducted and results with real data is presented.
{"title":"Online blind speech extraction based on a locally quadratic kurtosis criteria and a preprocessing Automatic Gain Controller","authors":"B. Sallberg, N. Grbic, I. Claesson","doi":"10.1109/ELMAR.2007.4418816","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418816","url":null,"abstract":"This paper focuses on realtime speech extraction using blind adaptive beamforming. The speech extraction is carried out using an approximation of the kurtosis measure in a subband domain. The introduced kurtosis approximation is an improvement of a recently proposed approximation technique where a locally quadratic criterion was solved at each iteration. The improvement introduced in this paper regards an approach to normalize this same criterion using a pre-processing automatic gain control unit, and thereby making the algorithm invariant to input signal scales. The proposed method outperforms the recent technique in terms of signal to interference ratio improvement. In addition, the increased memory consumption and processing load due to the proposed improvement is comparably low and this is often desirable in a realtime digital signal processor (DSP) implementation. Further, a real-time implementation of the method is conducted and results with real data is presented.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134120297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418790
E. Noth, C. Hacker, A. Batliner
Very often in articles on monomodal human-machine-interaction (HMI) it is pointed out that the results can strongly be improved if other modalities are taken into account. In this contribution we look at two different problems in HMI: the detection of emotion or user state and the question whether the user is currently interacting with the machine, himself or another person (On/Off-Focus). We present monomodal classification results for these two problems and discuss whether multimodal classification seems to be promising for the respective problem. Different fusion models are considered. The examples are taken from the German HMI projects "SmartKom" and "SmartWeb".
{"title":"Does multimodality really help? the classification of emotion and of On/Off-focus in multimodal dialogues - two case studies.","authors":"E. Noth, C. Hacker, A. Batliner","doi":"10.1109/ELMAR.2007.4418790","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418790","url":null,"abstract":"Very often in articles on monomodal human-machine-interaction (HMI) it is pointed out that the results can strongly be improved if other modalities are taken into account. In this contribution we look at two different problems in HMI: the detection of emotion or user state and the question whether the user is currently interacting with the machine, himself or another person (On/Off-Focus). We present monomodal classification results for these two problems and discuss whether multimodal classification seems to be promising for the respective problem. Different fusion models are considered. The examples are taken from the German HMI projects \"SmartKom\" and \"SmartWeb\".","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116160426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418836
M. Hadzialic, S. Colo, A. Sarajlic
In this paper a unified analytical approach to performance analyses in gamma shadowed Nakagami-m and Rice fading channel is presented. Instead of lognormal probability density function (PDF), gamma PDF is used for shadowing while multipath fading is represented with Nakagami-m and Rice distributions. A mathematical framework is developed for deriving key statistical parameters such as PDFs of signal to noise ratio (SNR) and signal to interference ratio (SIR) of several special scenarios in mobile channel as well as performance metrics including outage probability. In this way, one can conclude that presented results are reliable for lognormal shadowing spread <9dB (note that, the shadowing spread actually observed in macro-cells has a typical value that lies between 4 and 9 dB). The final results are remarkably simple and can serve as a quick way of assessing performance. In addition, presented analytical expressions are suitable for the asymptotic analyses, which is significant feature for both theoretical and practical aspect of theirs applications.
{"title":"An analytical approach to probability of outage evaluation in gamma shadowed Nakagami-m and rice fading channel","authors":"M. Hadzialic, S. Colo, A. Sarajlic","doi":"10.1109/ELMAR.2007.4418836","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418836","url":null,"abstract":"In this paper a unified analytical approach to performance analyses in gamma shadowed Nakagami-m and Rice fading channel is presented. Instead of lognormal probability density function (PDF), gamma PDF is used for shadowing while multipath fading is represented with Nakagami-m and Rice distributions. A mathematical framework is developed for deriving key statistical parameters such as PDFs of signal to noise ratio (SNR) and signal to interference ratio (SIR) of several special scenarios in mobile channel as well as performance metrics including outage probability. In this way, one can conclude that presented results are reliable for lognormal shadowing spread <9dB (note that, the shadowing spread actually observed in macro-cells has a typical value that lies between 4 and 9 dB). The final results are remarkably simple and can serve as a quick way of assessing performance. In addition, presented analytical expressions are suitable for the asymptotic analyses, which is significant feature for both theoretical and practical aspect of theirs applications.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116764221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418827
S. Cakaj, M. Shefkiu
"Modernization of telecommunication infrastructure of PTK " is a Post and Telecommunication of Kosova (PTK) project. This project was initiated to replace most of the switching and access infrastructue of PTK that was very old, up to semi-automatic exchanges. For this project PTK decided to go for the latest switching technology Next Generation Network (NGN). The project consist of replacement of switches in 6 regions with the new access equipment that would use the same Soft Switch installed at central server room in Prishtina, capital of Kosova. The applied aproach by PTK to make a large step by moving from semi-electronic switches to NGN is presented by this paper.
{"title":"Migration from PSTN to NGN","authors":"S. Cakaj, M. Shefkiu","doi":"10.1109/ELMAR.2007.4418827","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418827","url":null,"abstract":"\"Modernization of telecommunication infrastructure of PTK \" is a Post and Telecommunication of Kosova (PTK) project. This project was initiated to replace most of the switching and access infrastructue of PTK that was very old, up to semi-automatic exchanges. For this project PTK decided to go for the latest switching technology Next Generation Network (NGN). The project consist of replacement of switches in 6 regions with the new access equipment that would use the same Soft Switch installed at central server room in Prishtina, capital of Kosova. The applied aproach by PTK to make a large step by moving from semi-electronic switches to NGN is presented by this paper.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124911728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418811
A. Vrábel, R. Vargic, I. Kotuliak
Subscriber databases are substantial points of any network. Evolution from classical telecommunication networks to NGN has changed them and also services they provide. New functionalities are required and they should fulfill more complex tasks. In this article, we present subscriber databases and their evolution from GSM network through UMTS to the newest technology IMS. This trend includes evolution from home location register and authentication center to home subscriber server and introduction of new storage architectures like XDM and GUP.
{"title":"Subscriber databases and their evolution in mobile networks from GSM to IMS","authors":"A. Vrábel, R. Vargic, I. Kotuliak","doi":"10.1109/ELMAR.2007.4418811","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418811","url":null,"abstract":"Subscriber databases are substantial points of any network. Evolution from classical telecommunication networks to NGN has changed them and also services they provide. New functionalities are required and they should fulfill more complex tasks. In this article, we present subscriber databases and their evolution from GSM network through UMTS to the newest technology IMS. This trend includes evolution from home location register and authentication center to home subscriber server and introduction of new storage architectures like XDM and GUP.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115513709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418820
G. Feldhoffer
In this paper a speaker independent training method is presented for continuous voice to facial animation systems. An audiovisual database with multiple voices and only one speaker's video information was created using dynamic time warping. The video information is aligned to more speakers' voice. The fit is measured with subjective and objective tests. Suitability of implementations on mobile devices is discussed.
{"title":"Speaker independent continuous voice to facial animation on mobile platforms","authors":"G. Feldhoffer","doi":"10.1109/ELMAR.2007.4418820","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418820","url":null,"abstract":"In this paper a speaker independent training method is presented for continuous voice to facial animation systems. An audiovisual database with multiple voices and only one speaker's video information was created using dynamic time warping. The video information is aligned to more speakers' voice. The fit is measured with subjective and objective tests. Suitability of implementations on mobile devices is discussed.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115644848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-09-01DOI: 10.1109/ELMAR.2007.4418792
C. Tameze, R. Vincelette, N. Melikechi, V. Zeljkovic
We are investigating the use of the laser induced breakdown spectroscopy, (LIBS), on blood samples of mice to detect the earliest stages of epithelial ovarian cancer (EOC). A laser changes a blood samples to plasma state and the images produced thereby are analyzed. By comparing LIBS images of blood from EOC positive mice to those of cancer free mice, our goal is to identify differences by which we can detect those in early EOC stages. We apply an improved nonlinear diffusion filter to enhance relevant image edges and to remove noise and irrelevant texture.
{"title":"Empirical analysis of LIBS images using the nonlinear diffusion method","authors":"C. Tameze, R. Vincelette, N. Melikechi, V. Zeljkovic","doi":"10.1109/ELMAR.2007.4418792","DOIUrl":"https://doi.org/10.1109/ELMAR.2007.4418792","url":null,"abstract":"We are investigating the use of the laser induced breakdown spectroscopy, (LIBS), on blood samples of mice to detect the earliest stages of epithelial ovarian cancer (EOC). A laser changes a blood samples to plasma state and the images produced thereby are analyzed. By comparing LIBS images of blood from EOC positive mice to those of cancer free mice, our goal is to identify differences by which we can detect those in early EOC stages. We apply an improved nonlinear diffusion filter to enhance relevant image edges and to remove noise and irrelevant texture.","PeriodicalId":170000,"journal":{"name":"ELMAR 2007","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134361143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}