Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149860
K. Rengasamy, Piyush Joshi, Vvs Raveendra
Rapid advancements in image and video processing technologies are poised to create remarkable impacts on a wide range of industries. A significant challenge in these processing technologies resides in identifying the features fed for image classification algorithms. Though all classification algorithms could identify, extract and classify the features of a given image, their accuracy is directly proportional to the number of sample points taken from the image using a sampling technique. As the accuracy improves with a substantial number of sample points, the time consumed to process them looms large. These challenges beseech enormous computing power. Quantum computers avowed exceptional computing power is expected to bridge the growing demands. To address these challenges effectively, we have chosen a specific problem, Facial Expression Analysis, to explore in-depth and arrive at a purposeful approach to deliver the desired outcome. The purpose of this paper is two-pronged. Perform a comparative study of accuracy and performance of classical and quantum image processing algorithms in classical and quantum computers, respectively. Secondly, devise a novel hybrid model using a quantum distance-based classifier augmented with a classical linear support vector machine to overcome the limitations observed. Sample image features derived from the quantum classifier were used to train the linear classifier. The results were observed to be better relative to results from the classical distance-based classifier. Holistically, the novel hybrid model is observed as a promising solution for all image classification problems. Our future work will focus on sophisticated usage of a linear classification algorithm in quantum computing.
{"title":"Hybrid Facial Expression Analysis Model using Quantum Distance-based Classifier and Classical Support Vector Machine","authors":"K. Rengasamy, Piyush Joshi, Vvs Raveendra","doi":"10.1109/ESDC56251.2023.10149860","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149860","url":null,"abstract":"Rapid advancements in image and video processing technologies are poised to create remarkable impacts on a wide range of industries. A significant challenge in these processing technologies resides in identifying the features fed for image classification algorithms. Though all classification algorithms could identify, extract and classify the features of a given image, their accuracy is directly proportional to the number of sample points taken from the image using a sampling technique. As the accuracy improves with a substantial number of sample points, the time consumed to process them looms large. These challenges beseech enormous computing power. Quantum computers avowed exceptional computing power is expected to bridge the growing demands. To address these challenges effectively, we have chosen a specific problem, Facial Expression Analysis, to explore in-depth and arrive at a purposeful approach to deliver the desired outcome. The purpose of this paper is two-pronged. Perform a comparative study of accuracy and performance of classical and quantum image processing algorithms in classical and quantum computers, respectively. Secondly, devise a novel hybrid model using a quantum distance-based classifier augmented with a classical linear support vector machine to overcome the limitations observed. Sample image features derived from the quantum classifier were used to train the linear classifier. The results were observed to be better relative to results from the classical distance-based classifier. Holistically, the novel hybrid model is observed as a promising solution for all image classification problems. Our future work will focus on sophisticated usage of a linear classification algorithm in quantum computing.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121737178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increase in the number of vehicles on the road, traffic congestion has become a major problem in metropolitan areas. Generally, the traffic flow through a junction is controlled using static traffic lights which are unable to adapt to the real-time traffic condition at a junction and do not prioritize the movement of certain types of vehicles. Emergency vehicles (e.g. ambulance, fire, police, etc.) play a crucial role in all life-threatening situations, and ensuring their movement through a congested junction with minimal time delay is essential.In this paper, we propose an adaptive and efficient traffic signal control system for less-lane disciplined heterogeneous (mixed) traffic that can be easily integrated with the existing static traffic lights in a resource-constrained environment. A sound sensor-based emergency vehicle detection system is designed that accurately detects and classifies emergency vehicles by identifying their unique siren sound. The traffic camera data are processed in real-time to compute the PCU counts at every approach of a junction and to detect emergency vehicles that do not generate siren sounds. The experiment results show 100% accuracy in emergency vehicle detection, more than 95% accuracy in the emergency vehicle classification, and 65% accuracy in vehicle classification and PCU count. We also design a queuing theory-based cost function that considers the prevailing traffic condition and the presence of priority vehicle(s) at a junction. The cost function can be used to adapt the green phase of different approaches at a junction to improve the vehicle flow through the junction while minimizing the delay for the emergency vehicles.
{"title":"Traffic Congestion and Emergency Vehicle Responsive Traffic Signal Control in Resource Constrained Environment","authors":"Sagar Bapodara, Shyam Mesvani, Manish Chaturvedi, Pruthvish Rajput","doi":"10.1109/ESDC56251.2023.10149873","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149873","url":null,"abstract":"With the increase in the number of vehicles on the road, traffic congestion has become a major problem in metropolitan areas. Generally, the traffic flow through a junction is controlled using static traffic lights which are unable to adapt to the real-time traffic condition at a junction and do not prioritize the movement of certain types of vehicles. Emergency vehicles (e.g. ambulance, fire, police, etc.) play a crucial role in all life-threatening situations, and ensuring their movement through a congested junction with minimal time delay is essential.In this paper, we propose an adaptive and efficient traffic signal control system for less-lane disciplined heterogeneous (mixed) traffic that can be easily integrated with the existing static traffic lights in a resource-constrained environment. A sound sensor-based emergency vehicle detection system is designed that accurately detects and classifies emergency vehicles by identifying their unique siren sound. The traffic camera data are processed in real-time to compute the PCU counts at every approach of a junction and to detect emergency vehicles that do not generate siren sounds. The experiment results show 100% accuracy in emergency vehicle detection, more than 95% accuracy in the emergency vehicle classification, and 65% accuracy in vehicle classification and PCU count. We also design a queuing theory-based cost function that considers the prevailing traffic condition and the presence of priority vehicle(s) at a junction. The cost function can be used to adapt the green phase of different approaches at a junction to improve the vehicle flow through the junction while minimizing the delay for the emergency vehicles.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132604786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149872
Rushendra Sidibomma, R. Sanodiya
In domain adaptation, the goal is to train a neural network on the source domain and obtain a good accuracy on the target domain. In such a scenario, it is important to transfer the knowledge from the labelled source domain to the unlabelled target domain due to the expensive cost of manual labelling. Following the trail of works in the recent time, feature level alignment seems to be the most promising direction in unsupervised domain adaptation. In most of the recent works using this feature alignment, the semantic information present in the labelled source domain has not been exploited. Among the works that have tried to learn this semantic representations, the discriminative features have not been taken into consideration which results in lower accuracy on target domain. In this paper, we present a novel approach, joint discriminative and semantic transfer network (JDSTN) that not only aligns the semantic representations of source and target domain, but also enhances the discriminative features and thereby improving the accuracy significantly. This is achieved by using pseudo-labels to align the feature centroids of source and target domains while introducing losses that promote the learning of discriminative features.
{"title":"Learning Semantic Representations and Discriminative Features in Unsupervised Domain Adaptation","authors":"Rushendra Sidibomma, R. Sanodiya","doi":"10.1109/ESDC56251.2023.10149872","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149872","url":null,"abstract":"In domain adaptation, the goal is to train a neural network on the source domain and obtain a good accuracy on the target domain. In such a scenario, it is important to transfer the knowledge from the labelled source domain to the unlabelled target domain due to the expensive cost of manual labelling. Following the trail of works in the recent time, feature level alignment seems to be the most promising direction in unsupervised domain adaptation. In most of the recent works using this feature alignment, the semantic information present in the labelled source domain has not been exploited. Among the works that have tried to learn this semantic representations, the discriminative features have not been taken into consideration which results in lower accuracy on target domain. In this paper, we present a novel approach, joint discriminative and semantic transfer network (JDSTN) that not only aligns the semantic representations of source and target domain, but also enhances the discriminative features and thereby improving the accuracy significantly. This is achieved by using pseudo-labels to align the feature centroids of source and target domains while introducing losses that promote the learning of discriminative features.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122687631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149878
Akash Ther, Binit Kumar Pandit, Ayan Banerjee
Convolutional Neural Networks (CNNs) handle a massive variety of datasets with great accuracy for various image processing and computer vision applications. However, it comes at the cost of the requirement of large hardware resources, which are computational and energy extensive. There is a need for efficient hardware and algorithmic optimization for real-time CNN inference while deploying the CNN model. This paper proposes a novel VLSI architecture of generalized pooling operation for hardware acceleration of CNN inference in real-time. Generalized pooling operation adaptively downsamples the huge parameter set generated by the convolutional layer by generating weights as per the input features. It is capable of accommodating varying feature maps and preserves significant features, unlike other counterparts maximum and average pooling. In order to efficiently compute the output of the generalised pooling operation, the proposed hardware design makes use of the Newton-Raphson reciprocal approximation for division operations, a low number of comparators, and a high degree of parallelism. The proposed architecture is developed and tested for performance evaluation on Xilinx Vivado 2018.3, and the target device chosen is Zynq UltraScale + MPSoC ZCU104 Evaluation board.
{"title":"VLSI Architecture of Generalized Pooling for Hardware Acceleration of Convolutional Neural Networks","authors":"Akash Ther, Binit Kumar Pandit, Ayan Banerjee","doi":"10.1109/ESDC56251.2023.10149878","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149878","url":null,"abstract":"Convolutional Neural Networks (CNNs) handle a massive variety of datasets with great accuracy for various image processing and computer vision applications. However, it comes at the cost of the requirement of large hardware resources, which are computational and energy extensive. There is a need for efficient hardware and algorithmic optimization for real-time CNN inference while deploying the CNN model. This paper proposes a novel VLSI architecture of generalized pooling operation for hardware acceleration of CNN inference in real-time. Generalized pooling operation adaptively downsamples the huge parameter set generated by the convolutional layer by generating weights as per the input features. It is capable of accommodating varying feature maps and preserves significant features, unlike other counterparts maximum and average pooling. In order to efficiently compute the output of the generalised pooling operation, the proposed hardware design makes use of the Newton-Raphson reciprocal approximation for division operations, a low number of comparators, and a high degree of parallelism. The proposed architecture is developed and tested for performance evaluation on Xilinx Vivado 2018.3, and the target device chosen is Zynq UltraScale + MPSoC ZCU104 Evaluation board.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126892281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149859
Shaik Shakeera, V. Bala Naga Jyothi, H. Venkataraman
Dynamic ocean current in real-time plays a significant role for the precise navigation of underwater vehicles. Estimation and prediction of ocean currents with traditional methods such as Navier–Stokes equations, which are computationally very complex and also need huge historical ocean data for developing numerical models. Hence, in this paper Machine Learning, based on less complex and easily deployable regression methods is exercised to identify the best prediction model for ocean currents. Further, all the regression methods performed were compared with the R2 score, Mean Absolute Error (MAE) and Mean Square Error (MSE). Among all methods, the Decision tree regression-based ML method performed best with 84% accuracy with minimal error. Qualitative performance is studied using visualization of data correlation, heat maps are also generated and compared.
{"title":"ML-based techniques for prediction of Ocean currents for underwater vehicles","authors":"Shaik Shakeera, V. Bala Naga Jyothi, H. Venkataraman","doi":"10.1109/ESDC56251.2023.10149859","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149859","url":null,"abstract":"Dynamic ocean current in real-time plays a significant role for the precise navigation of underwater vehicles. Estimation and prediction of ocean currents with traditional methods such as Navier–Stokes equations, which are computationally very complex and also need huge historical ocean data for developing numerical models. Hence, in this paper Machine Learning, based on less complex and easily deployable regression methods is exercised to identify the best prediction model for ocean currents. Further, all the regression methods performed were compared with the R2 score, Mean Absolute Error (MAE) and Mean Square Error (MSE). Among all methods, the Decision tree regression-based ML method performed best with 84% accuracy with minimal error. Qualitative performance is studied using visualization of data correlation, heat maps are also generated and compared.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123758469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149852
R. Mathew, Eesh Shekhar Gharat, Siddharth Hooda
By 2024, the market for automated driving systems is anticipated to reach $20 billion, growing at a 25.7 percent annual pace between 2016 and 2024. Contour estimation through High-Resolution radars is one of the key functionalities of these growing industries and many techniques have been used for it. KNN-DBSCAN, Generalized Hough Transform, and brute force approach are some of the techniques studied. Size of encountered Radar cross-section (RCS), dependency on heuristics, accuracy, and computational expensive are some of the parameters against which comparison of the various techniques is done. Although these parameters are studied in-depth, there are parameters like interference and contamination of RADAR that have not been studied extensively in the literature.
{"title":"A Review Paper on Contour Estimation Techniques in High-Resolution Automotive Radars","authors":"R. Mathew, Eesh Shekhar Gharat, Siddharth Hooda","doi":"10.1109/ESDC56251.2023.10149852","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149852","url":null,"abstract":"By 2024, the market for automated driving systems is anticipated to reach $20 billion, growing at a 25.7 percent annual pace between 2016 and 2024. Contour estimation through High-Resolution radars is one of the key functionalities of these growing industries and many techniques have been used for it. KNN-DBSCAN, Generalized Hough Transform, and brute force approach are some of the techniques studied. Size of encountered Radar cross-section (RCS), dependency on heuristics, accuracy, and computational expensive are some of the parameters against which comparison of the various techniques is done. Although these parameters are studied in-depth, there are parameters like interference and contamination of RADAR that have not been studied extensively in the literature.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124696570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149851
Dinu Thomas, David Pratap, B. Sudha
Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.
{"title":"Video Label Enhancing and Standardization through Transcription and WikiId Mapping Techniques","authors":"Dinu Thomas, David Pratap, B. Sudha","doi":"10.1109/ESDC56251.2023.10149851","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149851","url":null,"abstract":"Volume of video content surpass all other content types in internet. As per the reports from different sources, video traffic had acquired 82% of internet usage in 2022. Video is going to be more important in the years to come for user engagement, advertisement & marketing, news, education etc. Video information retrieval becomes an important problem to solve in this context. An accurate and fast video tagging system can aid a good content recommendation to the end users. It helps to audit the content automatically thereby platforms can control the contents which are politically and morally harmful. There are not many faster or cost-effective mechanisms to tag user generated videos at this moment. Manual tagging is a costly and highly time taking task. A delay in indexing the videos like news, sports etc., shall reduce its freshness and relevancy. Deep learning techniques have reached its maturity in the contents like text and images, but it is not the case with videos. Deep learning models need more resources to deal with videos due to its multi-modality nature, and temporal behavior. Apart from that, there are not many large-scale video datasets available at this moment. Youtube-8M is the largest dataset which is publicly available as of now. Much research works happened over Youtube-8M dataset. From our study, all these have a potential limitation. For example, in Youtube-8M, Video labels are only around 3.8K which are not covering all real-world tags. It is not covering the new domains which are created along with the surge in the content traffic. This study aims to handle this problem of tag creation through different methods available thereby enhancing the labels to a much wider set. This work also aims to produce a scalable tagging pipeline which uses multiple retrieval mechanisms, combine their results. The work aims to standardize the retrieved tokens across languages. This work creates a dataset as an outcome from ‘WikiData’, which can be used for any NLP based standardization use cases. An attempt has been made to do disambiguation through WikiId embedding. A new WikiData embedding is created in this work, which can be used for eliminating the tags which are noisy.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114329586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149868
Jaswanth Nidamanuri, Trisanu Bhar, H. Venkataraman
Pothole Detection has been a part of Advanced Driver Assistant Systems (ADAS) for a long time. To detect potholes, many techniques have been used. Deep learning-based methods have been particularly successful in this regard. However, the localization of the potholes accurately may not be possible only by having one modality of information. This work explores the multi-sensor information fusion (from Accelerometer and Gyroscope) to detect the potholes. Notably, most of the existing works proposed to make use of models such as Convolution Neural Networks, and other Attention models like Long Short-Term Memory (LSTM)’s, Gated Recurrent Units (GRUs), and Transformers. Despite having such proven architectures for complex and non-linear learning representations with attention units, still, the challenge of real-time deployments with optimized computing devices remains unaddressed. With the proposed approach, efficient deployments are possible on edge devices embedded in the vehicle providing a reliable ADAS solution for improved driver safety. The investigations and ablation study from the proposal focus on two-fold addressing the trade-off between model size and test accuracy. Significantly, the proposed hybrid architecture, the INN-former with quantization, achieved a size reduction by 16.12%, not compromising much with the maximum test accuracy reported at 96.12%. Similarly, pruning achieves a 1.115% size reduction with a minimal difference in test accuracy of 85.43% for the INN-former, and a 3.76% decrease in size while only a 4.95% decrease in test accuracy reported as 95.43% with the Attention model making use of GRU/ LSTM. Importantly, the proposed work discusses the design parameters for lightweight architectures investigating the pruning and quantization techniques that are not compromising the generalization capability of the models, which is highly essential for real-time deployments and validation.
{"title":"Impact of Pruning and Quantization: A Light Weight Multi-Sensor Pothole Detection System","authors":"Jaswanth Nidamanuri, Trisanu Bhar, H. Venkataraman","doi":"10.1109/ESDC56251.2023.10149868","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149868","url":null,"abstract":"Pothole Detection has been a part of Advanced Driver Assistant Systems (ADAS) for a long time. To detect potholes, many techniques have been used. Deep learning-based methods have been particularly successful in this regard. However, the localization of the potholes accurately may not be possible only by having one modality of information. This work explores the multi-sensor information fusion (from Accelerometer and Gyroscope) to detect the potholes. Notably, most of the existing works proposed to make use of models such as Convolution Neural Networks, and other Attention models like Long Short-Term Memory (LSTM)’s, Gated Recurrent Units (GRUs), and Transformers. Despite having such proven architectures for complex and non-linear learning representations with attention units, still, the challenge of real-time deployments with optimized computing devices remains unaddressed. With the proposed approach, efficient deployments are possible on edge devices embedded in the vehicle providing a reliable ADAS solution for improved driver safety. The investigations and ablation study from the proposal focus on two-fold addressing the trade-off between model size and test accuracy. Significantly, the proposed hybrid architecture, the INN-former with quantization, achieved a size reduction by 16.12%, not compromising much with the maximum test accuracy reported at 96.12%. Similarly, pruning achieves a 1.115% size reduction with a minimal difference in test accuracy of 85.43% for the INN-former, and a 3.76% decrease in size while only a 4.95% decrease in test accuracy reported as 95.43% with the Attention model making use of GRU/ LSTM. Importantly, the proposed work discusses the design parameters for lightweight architectures investigating the pruning and quantization techniques that are not compromising the generalization capability of the models, which is highly essential for real-time deployments and validation.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114837362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149874
Mahesh Naggarapu, Shaik Shakeera, H. Venkataraman
In recent years, each area has emerged with autonomous capabilities such as vehicular navigation, aquaculture, and industrial appliances. Especially, in the field of vehicular transportation, power is the main constraint for autonomous operations. The battery is the main source of energy or power storage. Especially, rechargeable batteries are potentially utilized as energy storage systems due to their high energy density. However, battery modelling is an indispensable tool for designing a real-time battery management system (BMS) that estimates the run-life time of autonomous battery power systems. In this paper, an easy-to-use battery Simulink model with time-varying dynamic load has been designed as a tool for all autonomous vehicles to estimate the State of Charge (SOC). This proposed model comprises a Controlled-Voltage source in series with internal resistance and time-varying resistive load. The proposed model differs from the ideal model by 0.01 (1%) and mirrors the general behaviour of the ideal model. The performance analysis of both the ideal and proposed model is evaluated by Root Mean Square Error (RMSE) which must be less than 0.4 to accept the battery model. However, the proposed model achieved the RMSE value of 0.1 and estimates SOC which is widely acceptable.
{"title":"Battery Modelling and Performance Analysis of Time-Varying Load Using Simulink","authors":"Mahesh Naggarapu, Shaik Shakeera, H. Venkataraman","doi":"10.1109/ESDC56251.2023.10149874","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149874","url":null,"abstract":"In recent years, each area has emerged with autonomous capabilities such as vehicular navigation, aquaculture, and industrial appliances. Especially, in the field of vehicular transportation, power is the main constraint for autonomous operations. The battery is the main source of energy or power storage. Especially, rechargeable batteries are potentially utilized as energy storage systems due to their high energy density. However, battery modelling is an indispensable tool for designing a real-time battery management system (BMS) that estimates the run-life time of autonomous battery power systems. In this paper, an easy-to-use battery Simulink model with time-varying dynamic load has been designed as a tool for all autonomous vehicles to estimate the State of Charge (SOC). This proposed model comprises a Controlled-Voltage source in series with internal resistance and time-varying resistive load. The proposed model differs from the ideal model by 0.01 (1%) and mirrors the general behaviour of the ideal model. The performance analysis of both the ideal and proposed model is evaluated by Root Mean Square Error (RMSE) which must be less than 0.4 to accept the battery model. However, the proposed model achieved the RMSE value of 0.1 and estimates SOC which is widely acceptable.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"347 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125775125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1109/ESDC56251.2023.10149869
Narayana Darapaneni, A. Paduri, Dinu Thomas, Jisha C U, Abhinao Shrivastava, Seema Biradar
In today’s world, the UGC (User Generated Contents) videos have increased exponentially. Billions of videos are uploaded, played and exchanged between different actors. In this context, automatic video content classification has become a critical and challenging problem, especially in areas like video-based search, recommendation etc. In this work we try to extract frame-level visual and audio features, pre-extracted features are then converted into a compact video level representation effectively and efficiently. We aim to classify the video into a set of categories with high accuracy. From the literature survey, we identified that, the tagging of videos has been a problem which has not reached its maturity yet, and there are many researches happening in this area. It is observed that, the clustering based video description methodologies show a better result compared to the temporal algorithms. We also have identified that, majority of the SOTA techniques use the VLAD (Vector of Locally Aggregated Descriptors) technique to extract the video features and make the codebook learnable through some adjustments introduced in the NetVLAD. The key descriptors would be mostly noisy, and many of them are insignificant. In this work we aim to cascade a Self-Attention Block on the NetVLAD which can extract the significant descriptors and filter out the Noise. The YouTube 8M dataset shall be used for training the model and performance will be compared with other SOTA techniques. Like other similar works, model performance will be measured by GAP Metric (Global Average Precision) for all the videos predicted labels. We aim to achieve a GAP score close to 85% for this work.
{"title":"Video understanding : Tagging of videos through self attentive learnable key descriptors","authors":"Narayana Darapaneni, A. Paduri, Dinu Thomas, Jisha C U, Abhinao Shrivastava, Seema Biradar","doi":"10.1109/ESDC56251.2023.10149869","DOIUrl":"https://doi.org/10.1109/ESDC56251.2023.10149869","url":null,"abstract":"In today’s world, the UGC (User Generated Contents) videos have increased exponentially. Billions of videos are uploaded, played and exchanged between different actors. In this context, automatic video content classification has become a critical and challenging problem, especially in areas like video-based search, recommendation etc. In this work we try to extract frame-level visual and audio features, pre-extracted features are then converted into a compact video level representation effectively and efficiently. We aim to classify the video into a set of categories with high accuracy. From the literature survey, we identified that, the tagging of videos has been a problem which has not reached its maturity yet, and there are many researches happening in this area. It is observed that, the clustering based video description methodologies show a better result compared to the temporal algorithms. We also have identified that, majority of the SOTA techniques use the VLAD (Vector of Locally Aggregated Descriptors) technique to extract the video features and make the codebook learnable through some adjustments introduced in the NetVLAD. The key descriptors would be mostly noisy, and many of them are insignificant. In this work we aim to cascade a Self-Attention Block on the NetVLAD which can extract the significant descriptors and filter out the Noise. The YouTube 8M dataset shall be used for training the model and performance will be compared with other SOTA techniques. Like other similar works, model performance will be measured by GAP Metric (Global Average Precision) for all the videos predicted labels. We aim to achieve a GAP score close to 85% for this work.","PeriodicalId":354855,"journal":{"name":"2023 11th International Symposium on Electronic Systems Devices and Computing (ESDC)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126513519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}