Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364902
A. N. Iyer, U. Ofoegbu, R. Yantorno, B. Y. Smolenski
A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such information; instead, the number of clusters is automatically determined from the data. The performance of the proposed algorithm with the Hellinger, Bhattacharyya, Mahalanobis and the generalized likelihood ratio distance measures is evaluated and compared. The approach, employing the Hellinger distance, resulted in an average cluster purity value of 0.85 from experiments performed using the switchboard telephone conversation al speech database. The result indicates a 9% relative improvement in the average cluster purity as compared to the best performing agglomerative clustering system
{"title":"Blind Speaker Clustering","authors":"A. N. Iyer, U. Ofoegbu, R. Yantorno, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364902","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364902","url":null,"abstract":"A novel approach to performing speaker clustering in telephone conversations is presented in this paper. The method is based on a simple observation that the distance between populations of feature vectors extracted from different speakers is greater than a preset threshold. This observation is incorporated into the clustering problem by the formulation of a constrained optimization problem. A modified c-means algorithm is designed to solve the optimization problem. Another key aspect in speaker clustering is to determine the number of clusters, which is either assumed or expected as an input in traditional methods. The proposed method does not require such information; instead, the number of clusters is automatically determined from the data. The performance of the proposed algorithm with the Hellinger, Bhattacharyya, Mahalanobis and the generalized likelihood ratio distance measures is evaluated and compared. The approach, employing the Hellinger distance, resulted in an average cluster purity value of 0.85 from experiments performed using the switchboard telephone conversation al speech database. The result indicates a 9% relative improvement in the average cluster purity as compared to the best performing agglomerative clustering system","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126683159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364733
P. Tsai, T. Lee, T. Chiueh
This paper presents a fast Fourier transform (FFT) processor suitable for IEEE 802.16e (WiMax) OFDM mode. FFT/IFFT processors are very crucial in OFDM transceivers and they usually consume considerable power as well as occupy large area. The proposed FFT processor combines the pipelined architecture and the memory-based architecture so that it can operate at the sample rate and thus achieve power efficiency. The processor is based on the multipath delay commutator architecture with high-radix arithmetic units and two main memories for input buffering, intermediate storage, and output reordering. A proposed conflict-free memory addressing strategy makes possible continuous-flow FFT processing. Simulation results show that it achieves a 29% saving in power consumption.
{"title":"Power-Efficient Continuous-Flow Memory-Based FFT Processor for WiMax OFDM Mode","authors":"P. Tsai, T. Lee, T. Chiueh","doi":"10.1109/ISPACS.2006.364733","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364733","url":null,"abstract":"This paper presents a fast Fourier transform (FFT) processor suitable for IEEE 802.16e (WiMax) OFDM mode. FFT/IFFT processors are very crucial in OFDM transceivers and they usually consume considerable power as well as occupy large area. The proposed FFT processor combines the pipelined architecture and the memory-based architecture so that it can operate at the sample rate and thus achieve power efficiency. The processor is based on the multipath delay commutator architecture with high-radix arithmetic units and two main memories for input buffering, intermediate storage, and output reordering. A proposed conflict-free memory addressing strategy makes possible continuous-flow FFT processing. Simulation results show that it achieves a 29% saving in power consumption.","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127143232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364706
Dongguo Li, Katsumi Ymashita
Since introduces intercarrier interference (ICI) is the well-known main barrier of upgrading transmission performance for mobile OFDM systems, pilot-aided technique is regarded as the effective solution even some efficient bandwidth has to sacrifice. Some others proposed blind and semi-blind method for the channel estimation utilizes certain underlying statistical properties of the transmitted data make systems becomes complexity. In this paper, an effective channel estimation and equalization method with pilot-free is proposed, which not only can significantly mitigate the ICI and improve the BER performance but also can upgrade the system transmission efficiency, with practicability. According to the Monte Carlo simulations, the empirical results show that our pilot-free method can approach same performance as the known channel results
{"title":"Channel Estimation Based on Pilot-Free Method for Mobile OFDM Systems","authors":"Dongguo Li, Katsumi Ymashita","doi":"10.1109/ISPACS.2006.364706","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364706","url":null,"abstract":"Since introduces intercarrier interference (ICI) is the well-known main barrier of upgrading transmission performance for mobile OFDM systems, pilot-aided technique is regarded as the effective solution even some efficient bandwidth has to sacrifice. Some others proposed blind and semi-blind method for the channel estimation utilizes certain underlying statistical properties of the transmitted data make systems becomes complexity. In this paper, an effective channel estimation and equalization method with pilot-free is proposed, which not only can significantly mitigate the ICI and improve the BER performance but also can upgrade the system transmission efficiency, with practicability. According to the Monte Carlo simulations, the empirical results show that our pilot-free method can approach same performance as the known channel results","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127393730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, the method of synthesizing arbitrary view images that uses 16 cameras on a plane is examined. First of all, we propose the technique for estimating accurate depth information between the cameras and objects. Next, the real arbitrary view images are synthesized by using the 3D shape model estimated by the depth information and the actual images appropriately. The images at arbitrary view points can be generated without errors due to occlusion.
{"title":"Synthesis of Arbitrary View Images Using Depth Estimation Based on Iterative Comparison","authors":"Yasuyuki Haruta, Akira Kubotat, Ryutaro Oi, Takayuki Hamamoto","doi":"10.1109/ISPACS.2006.364721","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364721","url":null,"abstract":"In this paper, the method of synthesizing arbitrary view images that uses 16 cameras on a plane is examined. First of all, we propose the technique for estimating accurate depth information between the cameras and objects. Next, the real arbitrary view images are synthesized by using the 3D shape model estimated by the depth information and the actual images appropriately. The images at arbitrary view points can be generated without errors due to occlusion.","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126322903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364804
M. Suzuki, Ming Zhang, Haihua Chen, Tingting Teng
This paper proposes a novel design method of criteria for detecting the number of signals superimposed in multichannel time-series. Based on probabilistic properties of difference in maximum log likelihood at infinite SNR, penalty functions in information theoretic criteria are designed by giving specific upper bounds of error probabilities. The proposed design method uses an approximation of probability distribution functions of the difference in maximum log likelihood are approximated. Finally, simulation results are shown to demonstrate flexible criteria for detecting the number of signals can be designed in the case that the number of available samples of observation vectors is small and also large
{"title":"A Design of Information Theoretic Criteria for Detecting the Number of Incoherent Signals","authors":"M. Suzuki, Ming Zhang, Haihua Chen, Tingting Teng","doi":"10.1109/ISPACS.2006.364804","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364804","url":null,"abstract":"This paper proposes a novel design method of criteria for detecting the number of signals superimposed in multichannel time-series. Based on probabilistic properties of difference in maximum log likelihood at infinite SNR, penalty functions in information theoretic criteria are designed by giving specific upper bounds of error probabilities. The proposed design method uses an approximation of probability distribution functions of the difference in maximum log likelihood are approximated. Finally, simulation results are shown to demonstrate flexible criteria for detecting the number of signals can be designed in the case that the number of available samples of observation vectors is small and also large","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130067402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364751
S. Nagashima, T. Aoki, T. Higuchi, K. Kobayashi
This paper presents a high-accuracy image matching technique using a phase-only correlation (POC) function. The POC-based image matching enables estimation of image displacements with 1/10~1/100-pixel accuracy by a function fitting technique using the closed-form representation of the POC function's peak. This method requires an iterative process for the nonlinear function fitting, resulting in long computation times. In this paper, we propose a peak evaluation formula (PEF) that directly estimates the correlation peak location from actual 2-D data array of the POC function. Experimental evaluation shows that the proposed method reduces computation time without sacrificing image matching accuracy
{"title":"A Subpixel Image Matching Technique Using Phase-Only Correlation","authors":"S. Nagashima, T. Aoki, T. Higuchi, K. Kobayashi","doi":"10.1109/ISPACS.2006.364751","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364751","url":null,"abstract":"This paper presents a high-accuracy image matching technique using a phase-only correlation (POC) function. The POC-based image matching enables estimation of image displacements with 1/10~1/100-pixel accuracy by a function fitting technique using the closed-form representation of the POC function's peak. This method requires an iterative process for the nonlinear function fitting, resulting in long computation times. In this paper, we propose a peak evaluation formula (PEF) that directly estimates the correlation peak location from actual 2-D data array of the POC function. Experimental evaluation shows that the proposed method reduces computation time without sacrificing image matching accuracy","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"8 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113970702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364734
N. Takagi, K. Takagi
A VLSI algorithm for integer square-rooting is proposed. It is based on the radix-2 non-restoring square-rooting algorithm. Fast computation is achieved by the use of the radix-2 signed-digit representation. Nonetheless, the algorithm does not require normalization of the operand. Combinational (unfolded) implementation of the algorithm yields a regularly structured array square-rooter. Its delay is proportional to n, the bit length of the operand, while that of conventional ones is at least proportional to n log n.
{"title":"A VLSI Algorithm for Integer Square-Rooting","authors":"N. Takagi, K. Takagi","doi":"10.1109/ISPACS.2006.364734","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364734","url":null,"abstract":"A VLSI algorithm for integer square-rooting is proposed. It is based on the radix-2 non-restoring square-rooting algorithm. Fast computation is achieved by the use of the radix-2 signed-digit representation. Nonetheless, the algorithm does not require normalization of the operand. Combinational (unfolded) implementation of the algorithm yields a regularly structured array square-rooter. Its delay is proportional to n, the bit length of the operand, while that of conventional ones is at least proportional to n log n.","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128972515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364899
Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski
In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%
{"title":"A Speaker Count System for Telephone Conversations","authors":"Psj319l Maz, Uchechukwu Ofoegbul, Ananth N Iyerl, Robert E Yantornol, B. Y. Smolenski","doi":"10.1109/ISPACS.2006.364899","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364899","url":null,"abstract":"In telephone conversations, only short consecutive utterances can be examined for each speaker, therefore, discriminating between speakers in such conversations is a challenging task which becomes even more challenging when no information about the speakers is known a priori. In this paper, a technique for determining the number of speakers participating in a telephone conversation is presented. This approach assumes no knowledge or information about any of the participating speakers. The technique is based on comparing short utterances within the conversation and deciding whether or not they belong to the same speaker. The applications of this research include three-way call detection and speaker tracking, and could be extended to speaker change-point detection and indexing. The proposed method involves an elimination process in which speech segments matching a chosen set of reference models are sequentially removed from the conversation. Models are formed using the mean vectors and covariance matrices of linear predictive cepstral coefficients of voiced segments in the conversation. The use of the Mahalanobis distance to determine if two models belong to the same or to different speakers, based on likelihood ratio testing, is investigated. The relative amount of residual speech is observed after each elimination process to determine if an additional speaker is present. Experimentation was performed on 4000 artificial conversations from the HTIMIT database. The proposed system was able to yield an average speaker count accuracy of 78%","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129010968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364779
R. Suleesathira, N. Phaisal-atsawasenee
Performance of beamspace MUSIC (BMUSIC) deteriorates as the coherent arrivals become closely spaced. In this paper, we improve the BMUSIC by using the forward-backward averaging of a combined signal eigenvectors. Evaluations are given to illustrate the proposed method capability. Performance analysis of the BPSK and QPSK modulation with antenna arrays are derived and compared to bit error rates (BERs)
{"title":"Decorrelation for BPSK and QPSK Coherent Arrivals","authors":"R. Suleesathira, N. Phaisal-atsawasenee","doi":"10.1109/ISPACS.2006.364779","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364779","url":null,"abstract":"Performance of beamspace MUSIC (BMUSIC) deteriorates as the coherent arrivals become closely spaced. In this paper, we improve the BMUSIC by using the forward-backward averaging of a combined signal eigenvectors. Evaluations are given to illustrate the proposed method capability. Performance analysis of the BPSK and QPSK modulation with antenna arrays are derived and compared to bit error rates (BERs)","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128598150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2006-12-01DOI: 10.1109/ISPACS.2006.364785
A. Haggag, M. Ghoneim, Jianming Lu, T. Yahagi
In this paper we propose a progressive encryption and controlled access scheme for JPEG 2000 encoded images. Our scheme applies SNOW 2 stream cipher to JPEG 2000 codestreams in a way that preserves most of the inherent flexibility of JPEG 2000 encoded images and enables untrusted intermediate network transcoders to downstream an encrypted JPEG 2000 image without access to decryption keys. Our scheme can also control access to various image resolutions or quality layers, by granting users different levels of access, using different decryption keys. Our scheme preservers most of the inherent flexibility, scalability, and transcodability of encrypted JPEG 2000 images and also preserves end-to-end security.
{"title":"Progressive Encryption and Controlled Access Scheme for JPEG 2000 Encoded Images","authors":"A. Haggag, M. Ghoneim, Jianming Lu, T. Yahagi","doi":"10.1109/ISPACS.2006.364785","DOIUrl":"https://doi.org/10.1109/ISPACS.2006.364785","url":null,"abstract":"In this paper we propose a progressive encryption and controlled access scheme for JPEG 2000 encoded images. Our scheme applies SNOW 2 stream cipher to JPEG 2000 codestreams in a way that preserves most of the inherent flexibility of JPEG 2000 encoded images and enables untrusted intermediate network transcoders to downstream an encrypted JPEG 2000 image without access to decryption keys. Our scheme can also control access to various image resolutions or quality layers, by granting users different levels of access, using different decryption keys. Our scheme preservers most of the inherent flexibility, scalability, and transcodability of encrypted JPEG 2000 images and also preserves end-to-end security.","PeriodicalId":178644,"journal":{"name":"2006 International Symposium on Intelligent Signal Processing and Communications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116048630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}