Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179519
Shreekantha Nadig, S. Chakraborty, Anuj K. Shah, Chaitanaya Sharma, V. Ramasubramanian, Sachit Rao
End-to-end Automatic Speech Recognition (ASR) models with attention, especially the Joint Connectionist Temporal Classification (CTC) and Attention in Encoder-Decoder models have shown promising results. In this joint CTC and Attention framework, misalignment of attention with the ground truth is not penalised, as the focus is on optimising only the CTC and Attention cost functions. In this paper, a function that additionally minimizes alignment errors is introduced. This function is expected to enable the ASR system to attend to the right part of the input sequence, and in turn, minimize alignment and transcription errors. We also implement a dynamic weighting of losses corresponding with the tasks of CTC, attention, and alignment. We demonstrate that in many cases, the proposed design framework results in better performance and faster convergence. We show results on two datasets - TIMIT and Librispeech 100 hours for the phone recognition task by taking the alignments from a previously trained monophone Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) model.
{"title":"Jointly learning to align and transcribe using attention-based alignment and uncertainty-to-weigh losses","authors":"Shreekantha Nadig, S. Chakraborty, Anuj K. Shah, Chaitanaya Sharma, V. Ramasubramanian, Sachit Rao","doi":"10.1109/SPCOM50965.2020.9179519","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179519","url":null,"abstract":"End-to-end Automatic Speech Recognition (ASR) models with attention, especially the Joint Connectionist Temporal Classification (CTC) and Attention in Encoder-Decoder models have shown promising results. In this joint CTC and Attention framework, misalignment of attention with the ground truth is not penalised, as the focus is on optimising only the CTC and Attention cost functions. In this paper, a function that additionally minimizes alignment errors is introduced. This function is expected to enable the ASR system to attend to the right part of the input sequence, and in turn, minimize alignment and transcription errors. We also implement a dynamic weighting of losses corresponding with the tasks of CTC, attention, and alignment. We demonstrate that in many cases, the proposed design framework results in better performance and faster convergence. We show results on two datasets - TIMIT and Librispeech 100 hours for the phone recognition task by taking the alignments from a previously trained monophone Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) model.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129054568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179502
Renuka Mannem, H. Jyothi, Aravind Illa, P. Ghosh
With advancement in machine learning techniques, several speech related applications deploy end-to-end models to learn relevant features from the raw speech signal. In this work, we focus on the speech rate estimation task using an end-to-end model to learn representation from raw speech in a data driven manner. We propose an end-to-end model that comprises of 1-d convolutional layer to extract representations from raw speech and a convolutional dense neural network (CDNN) to predict speech rate from these representations. The primary aim of the work is to understand the nature of representations learned by end-to-end model for the speech rate estimation task. Experiments are performed using TIMIT corpus, in seen and unseen subject conditions. Experimental results reveal that, the frequency response of the learned 1-d CNN filters are low-pass in nature, and center frequencies of majority of the filters lie below 1000Hz. While comparing the performance of the proposed end-to-end system with the baseline MFCC based approach, we find that the performance of the learned features with CNN are on par with MFCC.
{"title":"Speech rate estimation using representations learned from speech with convolutional neural network","authors":"Renuka Mannem, H. Jyothi, Aravind Illa, P. Ghosh","doi":"10.1109/SPCOM50965.2020.9179502","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179502","url":null,"abstract":"With advancement in machine learning techniques, several speech related applications deploy end-to-end models to learn relevant features from the raw speech signal. In this work, we focus on the speech rate estimation task using an end-to-end model to learn representation from raw speech in a data driven manner. We propose an end-to-end model that comprises of 1-d convolutional layer to extract representations from raw speech and a convolutional dense neural network (CDNN) to predict speech rate from these representations. The primary aim of the work is to understand the nature of representations learned by end-to-end model for the speech rate estimation task. Experiments are performed using TIMIT corpus, in seen and unseen subject conditions. Experimental results reveal that, the frequency response of the learned 1-d CNN filters are low-pass in nature, and center frequencies of majority of the filters lie below 1000Hz. While comparing the performance of the proposed end-to-end system with the baseline MFCC based approach, we find that the performance of the learned features with CNN are on par with MFCC.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115575764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179496
Mallampalli Kapardi, Satya Patel, Raghu Sesha Iyengar, K. S. Sridharan, M. Raghavan
Supervised learning on image data demands availability of large amounts of annotated image data. Annotation is predominantly a tool assisted manual activity and increasingly accounts for a large share of budget in machine learning systems development. This is due to the time involved and the need for large manpower to annotate large databases. Instead of the predominantly bounding box drawing using mouse cursor, we propose a more natural human computer interface - the human gaze. We hereby propose a technique of image annotation by using a novel protocol for acquiring gaze data to create a polygon around the object rather than bounding boxes. In this study the method is outlined and the results are compared with manually created annotations. The technique can be used to annotate existing image databases or create new annotated databases by simultaneous image acquisition and annotation.
{"title":"Tool for image annotation based on gaze","authors":"Mallampalli Kapardi, Satya Patel, Raghu Sesha Iyengar, K. S. Sridharan, M. Raghavan","doi":"10.1109/SPCOM50965.2020.9179496","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179496","url":null,"abstract":"Supervised learning on image data demands availability of large amounts of annotated image data. Annotation is predominantly a tool assisted manual activity and increasingly accounts for a large share of budget in machine learning systems development. This is due to the time involved and the need for large manpower to annotate large databases. Instead of the predominantly bounding box drawing using mouse cursor, we propose a more natural human computer interface - the human gaze. We hereby propose a technique of image annotation by using a novel protocol for acquiring gaze data to create a polygon around the object rather than bounding boxes. In this study the method is outlined and the results are compared with manually created annotations. The technique can be used to annotate existing image databases or create new annotated databases by simultaneous image acquisition and annotation.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126021192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179632
H. N. Harikrishnan, N. Nagaraj
Neuromorphic computing systems are biologically inspired with an aim to understand the rich structure and behaviour of biological neural networks so that novel learning architectures can be designed in both software and hardware. Traditional machine learning and deep neural network architectures are only weakly inspired from the human brain. In this work, we propose a novel ‘neurochaos’ inspired hybrid machine learning architecture for classification. Specifically, we extract four ‘neurochaos’ features – firing time, firing rate, energy and entropy of the chaotic neural firing from the neurons in the ChaosNet architecture (which we have recently proposed). These are used to train a Support Vector Machine linear classifier. Such a hybrid approach yields superior performance in the low training sample regime on synthetically generated and real-world datasets. Our proposed method could be viewed as a novel application of chaos as a kernel trick and has the potential for combining with other machine learning algorithms.
{"title":"Neurochaos Inspired Hybrid Machine Learning Architecture for Classification","authors":"H. N. Harikrishnan, N. Nagaraj","doi":"10.1109/SPCOM50965.2020.9179632","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179632","url":null,"abstract":"Neuromorphic computing systems are biologically inspired with an aim to understand the rich structure and behaviour of biological neural networks so that novel learning architectures can be designed in both software and hardware. Traditional machine learning and deep neural network architectures are only weakly inspired from the human brain. In this work, we propose a novel ‘neurochaos’ inspired hybrid machine learning architecture for classification. Specifically, we extract four ‘neurochaos’ features – firing time, firing rate, energy and entropy of the chaotic neural firing from the neurons in the ChaosNet architecture (which we have recently proposed). These are used to train a Support Vector Machine linear classifier. Such a hybrid approach yields superior performance in the low training sample regime on synthetically generated and real-world datasets. Our proposed method could be viewed as a novel application of chaos as a kernel trick and has the potential for combining with other machine learning algorithms.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122660189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179612
K. Deka, S. Sharma
This paper presents a puncturing technique to design length-compatible polar codes. The punctured bits are identified with the help of differential evolution (DE). A DE-based optimization framework is developed where the sum of the bit-error-rate (BER) values of the information bits is minimized. We identify a set of bits which can be avoided for puncturing in the case of additive white Gaussian noise (AWGN) channels. This reduces the size of the candidate puncturing patterns. Simulation results confirm the superiority of the proposed technique over other state-of-the-art puncturing methods.
{"title":"Design of Puncturing for Length-Compatible Polar Codes Using Differential Evolution","authors":"K. Deka, S. Sharma","doi":"10.1109/SPCOM50965.2020.9179612","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179612","url":null,"abstract":"This paper presents a puncturing technique to design length-compatible polar codes. The punctured bits are identified with the help of differential evolution (DE). A DE-based optimization framework is developed where the sum of the bit-error-rate (BER) values of the information bits is minimized. We identify a set of bits which can be avoided for puncturing in the case of additive white Gaussian noise (AWGN) channels. This reduces the size of the candidate puncturing patterns. Simulation results confirm the superiority of the proposed technique over other state-of-the-art puncturing methods.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131834943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/spcom50965.2020.9179615
{"title":"SPCOM 2020 Front Matter","authors":"","doi":"10.1109/spcom50965.2020.9179615","DOIUrl":"https://doi.org/10.1109/spcom50965.2020.9179615","url":null,"abstract":"","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115003313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179574
S. Dey, E. Sharma, Rohit Budhiraja
Multi-pair two-way massive multiple-input multiple-output (mMIMO) relaying is being widely investigated. Most of the spectral efficiency (SE) investigations in mMIMO relaying assume ideal hardware. We consider a multi-pair two-way mMIMO half-duplex (HD) relay with user and relay hardware impairments. We derive a novel closed-form SE expression with maximum ratio relay processing and show that the SE, primarily due to the user hardware impairments, asymptotically saturates to a finite value despite the number of relay antennas N going to infinity. We also scale the HD relay hardware impairments as Nz with $zgeq 0$, and analyze the asymptotic SE limits for four different power scaling schemes. We use them to investigate the rate of increase in relay hardware impairments with increase in N that can be tolerated without compromising the SE.
{"title":"Impact of User and Relay Hardware Impairments on Spectral Efficiency of HD Massive MIMO Relay","authors":"S. Dey, E. Sharma, Rohit Budhiraja","doi":"10.1109/SPCOM50965.2020.9179574","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179574","url":null,"abstract":"Multi-pair two-way massive multiple-input multiple-output (mMIMO) relaying is being widely investigated. Most of the spectral efficiency (SE) investigations in mMIMO relaying assume ideal hardware. We consider a multi-pair two-way mMIMO half-duplex (HD) relay with user and relay hardware impairments. We derive a novel closed-form SE expression with maximum ratio relay processing and show that the SE, primarily due to the user hardware impairments, asymptotically saturates to a finite value despite the number of relay antennas N going to infinity. We also scale the HD relay hardware impairments as Nz with $zgeq 0$, and analyze the asymptotic SE limits for four different power scaling schemes. We use them to investigate the rate of increase in relay hardware impairments with increase in N that can be tolerated without compromising the SE.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133277864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179580
S. K. Vankayala, S. AshokKrishnanK., RaviTeja Gundeti, Konchady Gautam Shenoy
We consider a novel mechanism to pool HARQ (Hybrid Automatic Repeat Request) memory at the UE (User Equipment). In legacy systems, each carrier is allocated a separate section of the total HARQ memory. By pooling this memory and allocating as HARQ requests arrive, we significantly improve the memory utilization. Moreover, we can accommodate a larger fraction of arriving HARQ requests thus increasing HARQ throughput without increasing buffer requirement at the UE. In this work, we model the HARQ memory system as a multiserver queue, and obtain expressions for dropping probability and memory occupancy. We compare the pooling system to the legacy technology in an asymptotic regime, which is a good approximation in cases where the ratio of the largest to smallest packet size is large. This regime holds for scenarios with large Transport Block sizes, such as 5G New Radio. In this regime, under large load factor, we show that blocking probability reduces under the pooling mechanism and uses less resources.
{"title":"On the Improved Memory Utilization in HARQ Pooling","authors":"S. K. Vankayala, S. AshokKrishnanK., RaviTeja Gundeti, Konchady Gautam Shenoy","doi":"10.1109/SPCOM50965.2020.9179580","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179580","url":null,"abstract":"We consider a novel mechanism to pool HARQ (Hybrid Automatic Repeat Request) memory at the UE (User Equipment). In legacy systems, each carrier is allocated a separate section of the total HARQ memory. By pooling this memory and allocating as HARQ requests arrive, we significantly improve the memory utilization. Moreover, we can accommodate a larger fraction of arriving HARQ requests thus increasing HARQ throughput without increasing buffer requirement at the UE. In this work, we model the HARQ memory system as a multiserver queue, and obtain expressions for dropping probability and memory occupancy. We compare the pooling system to the legacy technology in an asymptotic regime, which is a good approximation in cases where the ratio of the largest to smallest packet size is large. This regime holds for scenarios with large Transport Block sizes, such as 5G New Radio. In this regime, under large load factor, we show that blocking probability reduces under the pooling mechanism and uses less resources.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"53 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116756249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179563
Rupender Singh, M. Rawat, Anshul Jaiswal
This article focuses on a dual-hop simultaneous wireless information and power transfer (SWIPT) single-input-multiple-output (SIMO) system having one free-space optical (FSO) link followed by one radio frequency (RF) link. We assume that RF source conveys secure information to the FSO receiver through an intermediate relay equipped with multiple antenna, which exploits the decode-and-forward relaying technique. The FSO link endures pointing error and atmospheric turbulence, which is modeled as Gamma-Gamma distribution, and RF link suffers from Fisher-Snedecor F fading. We investigate the consequences of number of antennas, pointing error, atmospheric turbulence, detection technology, electrical signal-to-noise ratio of FSO link, and energy harvesting on the performance of proposed SIMO mixed FSO/RF SWIPT framework. More specifically, we derive the unified analytical expressions for statistical channel characteristics such as probability density function, moments, amount of fading, outage probability, and ergodic capacity. Based on these results, physical layer security analysis is carried out, and the analytical expressions for secure outage probability and strictly positive secrecy capacity are derived.
{"title":"Mixed FSO/RF SIMO SWIPT Decode-and-Forward Relaying Systems","authors":"Rupender Singh, M. Rawat, Anshul Jaiswal","doi":"10.1109/SPCOM50965.2020.9179563","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179563","url":null,"abstract":"This article focuses on a dual-hop simultaneous wireless information and power transfer (SWIPT) single-input-multiple-output (SIMO) system having one free-space optical (FSO) link followed by one radio frequency (RF) link. We assume that RF source conveys secure information to the FSO receiver through an intermediate relay equipped with multiple antenna, which exploits the decode-and-forward relaying technique. The FSO link endures pointing error and atmospheric turbulence, which is modeled as Gamma-Gamma distribution, and RF link suffers from Fisher-Snedecor F fading. We investigate the consequences of number of antennas, pointing error, atmospheric turbulence, detection technology, electrical signal-to-noise ratio of FSO link, and energy harvesting on the performance of proposed SIMO mixed FSO/RF SWIPT framework. More specifically, we derive the unified analytical expressions for statistical channel characteristics such as probability density function, moments, amount of fading, outage probability, and ergodic capacity. Based on these results, physical layer security analysis is carried out, and the analytical expressions for secure outage probability and strictly positive secrecy capacity are derived.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125864482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-01DOI: 10.1109/SPCOM50965.2020.9179617
R. Ammu, N. Sinha
Automatic image segmentation and quantification are critical steps in medical image analysis. The main challenges in medical image segmentation are due to the imbalance in data distribution and spatial variations of ROI. The ideal segmentation should extract all kinds of segments irrespective of size, shape and position. Commonly used metrics such as accuracy, IOU, Dice similarity coefficient consider all the detected pixels in a similar way. However, the detection of smaller segments is critical in medical analysis since it helps in early treatment of the disease and are also easier to miss. Hence, segmentation evaluation must accord larger weighting to pixels in smaller segments compared to the bigger ones. We propose a novel evaluation metric for segmentation performance, emphasizing smaller segments, by assigning a higher weightage to those pixels. Weighted false positives are also considered in deriving the new metric named, “SSEGEP” (Smatt SEGment Emphasized Performance evaluation metric), (range: 0 (Bad) to 1 (Good)). The proposed approach has been applied to two different publicly available real medical data sets of CT modality consisting of scans of the liver and pancreas of 131 and 107 subjects respectively and the results have been compared with existing evaluation metrics. Statistical significance testing is performed to quantity the relevance of the proposed approach. In comparison to Dice similarity coefficient, SSEGEP resulted in a promising p-value of the order 10-18 for hepatic tumor. The proposed metric is found to perform better for the images having multiple segments for a single label and where the regions of interest are not localized.
{"title":"Small Segment Emphasized Performance Evaluation Metric for Medical Images","authors":"R. Ammu, N. Sinha","doi":"10.1109/SPCOM50965.2020.9179617","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179617","url":null,"abstract":"Automatic image segmentation and quantification are critical steps in medical image analysis. The main challenges in medical image segmentation are due to the imbalance in data distribution and spatial variations of ROI. The ideal segmentation should extract all kinds of segments irrespective of size, shape and position. Commonly used metrics such as accuracy, IOU, Dice similarity coefficient consider all the detected pixels in a similar way. However, the detection of smaller segments is critical in medical analysis since it helps in early treatment of the disease and are also easier to miss. Hence, segmentation evaluation must accord larger weighting to pixels in smaller segments compared to the bigger ones. We propose a novel evaluation metric for segmentation performance, emphasizing smaller segments, by assigning a higher weightage to those pixels. Weighted false positives are also considered in deriving the new metric named, “SSEGEP” (Smatt SEGment Emphasized Performance evaluation metric), (range: 0 (Bad) to 1 (Good)). The proposed approach has been applied to two different publicly available real medical data sets of CT modality consisting of scans of the liver and pancreas of 131 and 107 subjects respectively and the results have been compared with existing evaluation metrics. Statistical significance testing is performed to quantity the relevance of the proposed approach. In comparison to Dice similarity coefficient, SSEGEP resulted in a promising p-value of the order 10-18 for hepatic tumor. The proposed metric is found to perform better for the images having multiple segments for a single label and where the regions of interest are not localized.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127166380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}