Pub Date : 2017-10-03DOI: 10.1109/SiPS.2017.8109995
Mohamed Mourad Hafidhi, E. Boutillon
Increasing the integration density offers the possibility for designers to built very complex system on a single chip. However, approaching the limits of integration, circuit reliability has emerged as a critical concern. The loss of reliability increases with process/voltage and temperature (PVT) variations. Faults can appear in circuits which can affect the system behaviour and lead to a system failure. Therefore it is increasingly important to build more fault tolerant resilient system. This paper 1 proposes a new fault tolerant scheme, the Duplication with Syndrome based Correction (DSC) scheme. Two criteria were considered to evaluate the proposed scheme: the reliability (probability that no error appears in the output of the architecture) and the hardware efficiency of the architecture. Results show that the DSC scheme reduces the complexity by 32%, compared to the classical Triple Modular Redundancy (TMR) scheme, while maintaining a level of reliability closed to the TMR. The paper shows also an example of signal processing applications where the DSC has been used to protect the correlation function and filters inside the tracking loops of the Global Positioning System (GPS) receiver.
{"title":"Hardware error correction using local syndromes","authors":"Mohamed Mourad Hafidhi, E. Boutillon","doi":"10.1109/SiPS.2017.8109995","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109995","url":null,"abstract":"Increasing the integration density offers the possibility for designers to built very complex system on a single chip. However, approaching the limits of integration, circuit reliability has emerged as a critical concern. The loss of reliability increases with process/voltage and temperature (PVT) variations. Faults can appear in circuits which can affect the system behaviour and lead to a system failure. Therefore it is increasingly important to build more fault tolerant resilient system. This paper 1 proposes a new fault tolerant scheme, the Duplication with Syndrome based Correction (DSC) scheme. Two criteria were considered to evaluate the proposed scheme: the reliability (probability that no error appears in the output of the architecture) and the hardware efficiency of the architecture. Results show that the DSC scheme reduces the complexity by 32%, compared to the classical Triple Modular Redundancy (TMR) scheme, while maintaining a level of reliability closed to the TMR. The paper shows also an example of signal processing applications where the DSC has been used to protect the correlation function and filters inside the tracking loops of the Global Positioning System (GPS) receiver.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131366305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-03DOI: 10.1109/SiPS.2017.8109982
S. Brunet, E. Bezati, M. Mattavelli
The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implementations are obtained by developing a dynamic dataflow program implementing the algorithm and by direct high-level synthesis (HLS) to FPGA HDL. The main feature of the obtained implementation is a low-latency, pipelinable multistage processing element (PE), providing a substantial decrease in resource utilization and increase in computation throughput when compared to state of the art solutions. The implementation solution is also fully scalable and can be efficiently reconfigured according to the DNA sequence sizes and performance requirements of the system architecture. The implementation solution presented in the paper can efficiently scale up to 250MHz obtaining 14746 Alignments/s using a single S-W core with 4 PEs, and up to 31.8 Mega-Alignments/min using 36 S-W cores on the same FPGA for sequences of 160×100 nucleotides.
{"title":"Design space exploration of dataflow-based Smith-Waterman FPGA implementations","authors":"S. Brunet, E. Bezati, M. Mattavelli","doi":"10.1109/SiPS.2017.8109982","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109982","url":null,"abstract":"The paper presents the results of design space explorations for the implementation of the Smith-Waterman (S-W) algorithm performing DNA and protein sequences alignment. Both design explorations studies and FPGA implementations are obtained by developing a dynamic dataflow program implementing the algorithm and by direct high-level synthesis (HLS) to FPGA HDL. The main feature of the obtained implementation is a low-latency, pipelinable multistage processing element (PE), providing a substantial decrease in resource utilization and increase in computation throughput when compared to state of the art solutions. The implementation solution is also fully scalable and can be efficiently reconfigured according to the DNA sequence sizes and performance requirements of the system architecture. The implementation solution presented in the paper can efficiently scale up to 250MHz obtaining 14746 Alignments/s using a single S-W core with 4 PEs, and up to 31.8 Mega-Alignments/min using 36 S-W cores on the same FPGA for sequences of 160×100 nucleotides.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127051205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-03DOI: 10.1109/SiPS.2017.8109976
Fraser K. Coutts, K. Thompson, Stephan Weiss, I. Proudler
A number of algorithms capable of iteratively calculating a polynomial matrix eigenvalue decomposition (PEVD) have been introduced. The PEVD is an extension of the ordinary EVD to polynomial matrices and will diagonalise a parahermitian matrix using paraunitary operations. Inspired by recent work towards a low complexity divide-and-conquer PEVD algorithm, this paper analyses the performance of this algorithm — named divide-and-conquer sequential matrix diagonalisation (DC-SMD) — for applications involving broadband sensor arrays of various dimensionalities. We demonstrate that by using the DC-SMD algorithm instead of a traditional alternative, PEVD complexity and execution time can be significantly reduced. This reduction is shown to be especially impactful for broadband multichannel problems involving large arrays.
{"title":"Analysing the performance of divide-and-conquer sequential matrix diagonalisation for large broadband sensor arrays","authors":"Fraser K. Coutts, K. Thompson, Stephan Weiss, I. Proudler","doi":"10.1109/SiPS.2017.8109976","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109976","url":null,"abstract":"A number of algorithms capable of iteratively calculating a polynomial matrix eigenvalue decomposition (PEVD) have been introduced. The PEVD is an extension of the ordinary EVD to polynomial matrices and will diagonalise a parahermitian matrix using paraunitary operations. Inspired by recent work towards a low complexity divide-and-conquer PEVD algorithm, this paper analyses the performance of this algorithm — named divide-and-conquer sequential matrix diagonalisation (DC-SMD) — for applications involving broadband sensor arrays of various dimensionalities. We demonstrate that by using the DC-SMD algorithm instead of a traditional alternative, PEVD complexity and execution time can be significantly reduced. This reduction is shown to be especially impactful for broadband multichannel problems involving large arrays.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"125 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114010169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8109984
Xiao Chen, J. Xin, Nanning Zheng, A. Sano
In this paper, we consider the direction-of-arrivals (DOAs) estimation of coherent narrowband signals impinging on an arbitrary linear array in a computational efficient way. A new interpolation transform based modified Capon beam-forming method is proposed without eigendecomposition, where the arbitrary linear array is transformed to a virtual uniform linear array (ULA) by utilizing the interpolation technique, and then the coherency of incident signals can be decorrelated by employing the spatial smoothing preprocessing. Further by increasing the power of array covariance matrix, a modified Capon beamformer is used to estimate the DOAs, where the component corresponding to the signal subspace is suppressed. The effectiveness of the proposed method is verified through numerical examples, and the simulation results show that the proposed method performs as well as the subspace-based method at low signal to noise ratio (SNR) or with small number of snapshots.
{"title":"Direction-of-arrival estimation of coherent narrowband signals with arbitrary linear array","authors":"Xiao Chen, J. Xin, Nanning Zheng, A. Sano","doi":"10.1109/SiPS.2017.8109984","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109984","url":null,"abstract":"In this paper, we consider the direction-of-arrivals (DOAs) estimation of coherent narrowband signals impinging on an arbitrary linear array in a computational efficient way. A new interpolation transform based modified Capon beam-forming method is proposed without eigendecomposition, where the arbitrary linear array is transformed to a virtual uniform linear array (ULA) by utilizing the interpolation technique, and then the coherency of incident signals can be decorrelated by employing the spatial smoothing preprocessing. Further by increasing the power of array covariance matrix, a modified Capon beamformer is used to estimate the DOAs, where the component corresponding to the signal subspace is suppressed. The effectiveness of the proposed method is verified through numerical examples, and the simulation results show that the proposed method performs as well as the subspace-based method at low signal to noise ratio (SNR) or with small number of snapshots.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114901722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8109986
Hugues Wouafo, C. Chavet, P. Coussy, R. Danilo
Different neural network models have been proposed to design efficient associative memories like Hopfield networks, Boltzmann machines or Cogent confabulation. Compared to the classical models, Encoded Neural Network (ENN) is a recently introduced formalism with a proven higher efficiency. This model has been improved through different contributions like Clone-based ENN (CbNNs) or Sparse ENNs (S-ENNs) which enhance either the capacity of the original ENN or its retrieving performances. However, only very few works explored its hardware implementation for embedded applications. In this paper, we introduce a clone-based sparse neural network model (SC-ENN), that gathers the enhancements of the existing approaches in a single formal model. In addition, we present a dedicated scalable hardware architecture to implement SC-ENN. This work leads to significant complexity and area reduction without affecting neither memorizing nor retrieving performances. By only handling the most relevant information provided by the model, our proposed approach is far less expensive compared to state of the art solutions.
{"title":"Efficient scalable hardware architecture for highly performant encoded neural networks","authors":"Hugues Wouafo, C. Chavet, P. Coussy, R. Danilo","doi":"10.1109/SiPS.2017.8109986","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109986","url":null,"abstract":"Different neural network models have been proposed to design efficient associative memories like Hopfield networks, Boltzmann machines or Cogent confabulation. Compared to the classical models, Encoded Neural Network (ENN) is a recently introduced formalism with a proven higher efficiency. This model has been improved through different contributions like Clone-based ENN (CbNNs) or Sparse ENNs (S-ENNs) which enhance either the capacity of the original ENN or its retrieving performances. However, only very few works explored its hardware implementation for embedded applications. In this paper, we introduce a clone-based sparse neural network model (SC-ENN), that gathers the enhancements of the existing approaches in a single formal model. In addition, we present a dedicated scalable hardware architecture to implement SC-ENN. This work leads to significant complexity and area reduction without affecting neither memorizing nor retrieving performances. By only handling the most relevant information provided by the model, our proposed approach is far less expensive compared to state of the art solutions.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129923409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8110017
Filipa dos Santos, Péter András, D. Collins, K. Lam
The combination of intra and extra-cellular recording of small neuronal circuits such as stomatogastric nervous systems of the crab (Cancer borealis) is well documented and routinely practised. Voltage sensitive dye imaging (VSDi) is a promising technology for the simultaneous monitoring of neuronal activities in such a system. However, integrating data obtained from optical VSDi and electrophysiological recording of the lateral ventricular nerve (lvn) is a complex and exacting task. Our early work demonstrated some of the concepts and principle involved. In this paper, we examine and report on the results obtained from the application of signal processing techniques to three datasets for which we had VSDi and lvn data. Whilst significant challenges remain, we show that such an approach offers the possibility of real-time monitoring using automated analysis of VSDi data streams without the requirement for either extracellular (lvn) or intracellular recording.
{"title":"A robust data-driven approach to the decoding of pyloric neuron activity","authors":"Filipa dos Santos, Péter András, D. Collins, K. Lam","doi":"10.1109/SiPS.2017.8110017","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110017","url":null,"abstract":"The combination of intra and extra-cellular recording of small neuronal circuits such as stomatogastric nervous systems of the crab (Cancer borealis) is well documented and routinely practised. Voltage sensitive dye imaging (VSDi) is a promising technology for the simultaneous monitoring of neuronal activities in such a system. However, integrating data obtained from optical VSDi and electrophysiological recording of the lateral ventricular nerve (lvn) is a complex and exacting task. Our early work demonstrated some of the concepts and principle involved. In this paper, we examine and report on the results obtained from the application of signal processing techniques to three datasets for which we had VSDi and lvn data. Whilst significant challenges remain, we show that such an approach offers the possibility of real-time monitoring using automated analysis of VSDi data streams without the requirement for either extracellular (lvn) or intracellular recording.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129662906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8109994
Chun-An Chen, Chiao-En Chen, Yuan-Hao Huang
To improve the spectrum efficiency in wireless communication systems, multiple-input multiple-output (MIMO) technology uses multiple antennas and allows several users to share the same spectrum and antennas by using precoding technique. In the leakage-based precoding technique, generalized eigenvalue decomposition (GEVD) must generate many precoding matrices for all users in the base station to avoid co-channel interference. This paper presents a GEVD algorithm based on forward substitution (FS) scheme to avoid matrix inversion operations. This research also designed and implemented the GEVD processor by using a 40nm CMOS technology. The synthesis results show that the FS-based GEVD processor can reduce area cost by 52% and improve the processing throughput by 12% compared to our previous GEVD [1] processor based on triangular matrix inversion with block multiplication.
{"title":"Forward-substitution-based generalized eigenvalue decomposition processor for MU-MIMO precoding","authors":"Chun-An Chen, Chiao-En Chen, Yuan-Hao Huang","doi":"10.1109/SiPS.2017.8109994","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109994","url":null,"abstract":"To improve the spectrum efficiency in wireless communication systems, multiple-input multiple-output (MIMO) technology uses multiple antennas and allows several users to share the same spectrum and antennas by using precoding technique. In the leakage-based precoding technique, generalized eigenvalue decomposition (GEVD) must generate many precoding matrices for all users in the base station to avoid co-channel interference. This paper presents a GEVD algorithm based on forward substitution (FS) scheme to avoid matrix inversion operations. This research also designed and implemented the GEVD processor by using a 40nm CMOS technology. The synthesis results show that the FS-based GEVD processor can reduce area cost by 52% and improve the processing throughput by 12% compared to our previous GEVD [1] processor based on triangular matrix inversion with block multiplication.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123853973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8110007
F. Qureshi, Muazam Ali, J. Takala
This paper presents area-efficient building blocks for computing fast Fourier transform (FFT): multiplierless processing elements to be used for computing of radix-3 and radix-5 butterflies and reconfigurable processing element supporting mixed radix-2/3/4/5 FFT algorithms. The proposed processing elements are based on Wingorad Fourier transform algorithm. However, multiplication is performed by constant multiplier instead of a general complex-valued multiplier. The proposed process elements have potential use in both pipelined and memory based FFT architectures, where the non-power-of-two sizes are required. The results show that the proposed multiplierless processing elements reduce the significant hardware cost in terms of adders.
{"title":"Multiplierless reconfigurable processing element for mixed radix-2/3/4/5 FFTs","authors":"F. Qureshi, Muazam Ali, J. Takala","doi":"10.1109/SiPS.2017.8110007","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110007","url":null,"abstract":"This paper presents area-efficient building blocks for computing fast Fourier transform (FFT): multiplierless processing elements to be used for computing of radix-3 and radix-5 butterflies and reconfigurable processing element supporting mixed radix-2/3/4/5 FFT algorithms. The proposed processing elements are based on Wingorad Fourier transform algorithm. However, multiplication is performed by constant multiplier instead of a general complex-valued multiplier. The proposed process elements have potential use in both pipelined and memory based FFT architectures, where the non-power-of-two sizes are required. The results show that the proposed multiplierless processing elements reduce the significant hardware cost in terms of adders.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130281217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8109968
Musab Ghadi, L. Laouamer, Laurent Nana, A. Pascu
A watermarking system ensures the reliability of images transmitted over public networks by asserting their authenticity. The texture property of host image is one of the most interesting characteristics that can be exploited to design image authentication systems. The importance of texture property emerged from the principles of Human Visual System (HVS), where modifying the spatial pixels of highly textured regions within host image increases the imperceptibility and the robustness against different image processing attacks. Many efficient features known in the literature are used to define the texture property of host image, but all of them are intangible. The model proposed in this paper suggests to solve this intangibility by applying one of Multi-Criteria Decision Making (MCDM) methods, in order to define highly textured blocks within host image to hold the secret data. The proposed model has been tested on grayscale image and the experiments result shows high level of imperceptibility and robustness against different singular and hybrid attacks.
{"title":"A joint spatial texture analysis/watermarking system for digital image authentication","authors":"Musab Ghadi, L. Laouamer, Laurent Nana, A. Pascu","doi":"10.1109/SiPS.2017.8109968","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109968","url":null,"abstract":"A watermarking system ensures the reliability of images transmitted over public networks by asserting their authenticity. The texture property of host image is one of the most interesting characteristics that can be exploited to design image authentication systems. The importance of texture property emerged from the principles of Human Visual System (HVS), where modifying the spatial pixels of highly textured regions within host image increases the imperceptibility and the robustness against different image processing attacks. Many efficient features known in the literature are used to define the texture property of host image, but all of them are intangible. The model proposed in this paper suggests to solve this intangibility by applying one of Multi-Criteria Decision Making (MCDM) methods, in order to define highly textured blocks within host image to hold the secret data. The proposed model has been tested on grayscale image and the experiments result shows high level of imperceptibility and robustness against different singular and hybrid attacks.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122103295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8109990
Issa Qiqieh, R. Shafik, Ghaith Tarawneh, D. Sokolov, Shidhartha Das, Alexandre Yakovlev
In this paper, we propose an energy-efficient approximate multiplier design approach. Fundamental to this approach is configurable lossy logic compression, coupled with low-cost error mitigation. The logic compression is aimed at reducing the number of product rows using progressive bit significance, and thereby decreasing the number of reduction stages in Wallace-tree accumulation. This accounts for substantially lower number of logic counts and lengths of the critical paths at the cost of errors in lower significant bits. These errors are minimised through a parallel error detection logic and compensation vector. To validate the effectiveness of our approach, multiple 8-bit multipliers are designed and synthesized using Synopses Design Compiler with different logic compression levels. Post synthesis experiments showed the trade-offs between energy and accuracy for these compression levels, featuring up to 70% reduction in power-delay product (PDP) and 60% lower area in the case of a multiplier with 4-bit logic compression. These gains are achieved at a low loss of accuracy, estimated at less than 0.0554 of mean relative error. To demonstrate the impact of approximation on a real application, a case study of image convolution filter was extensively investigated, which showed up to 62% (without error compensation) and 45% (with error compensation) energy savings when processing image with a multiplier using 4-bit logic compression.
{"title":"Energy-efficient approximate wallace-tree multiplier using significance-driven logic compression","authors":"Issa Qiqieh, R. Shafik, Ghaith Tarawneh, D. Sokolov, Shidhartha Das, Alexandre Yakovlev","doi":"10.1109/SiPS.2017.8109990","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109990","url":null,"abstract":"In this paper, we propose an energy-efficient approximate multiplier design approach. Fundamental to this approach is configurable lossy logic compression, coupled with low-cost error mitigation. The logic compression is aimed at reducing the number of product rows using progressive bit significance, and thereby decreasing the number of reduction stages in Wallace-tree accumulation. This accounts for substantially lower number of logic counts and lengths of the critical paths at the cost of errors in lower significant bits. These errors are minimised through a parallel error detection logic and compensation vector. To validate the effectiveness of our approach, multiple 8-bit multipliers are designed and synthesized using Synopses Design Compiler with different logic compression levels. Post synthesis experiments showed the trade-offs between energy and accuracy for these compression levels, featuring up to 70% reduction in power-delay product (PDP) and 60% lower area in the case of a multiplier with 4-bit logic compression. These gains are achieved at a low loss of accuracy, estimated at less than 0.0554 of mean relative error. To demonstrate the impact of approximation on a real application, a case study of image convolution filter was extensively investigated, which showed up to 62% (without error compensation) and 45% (with error compensation) energy savings when processing image with a multiplier using 4-bit logic compression.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122328894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}