Pub Date : 1900-01-01DOI: 10.54364/aaiml.2022.1119
Matthew Avaylon, R. Sadre, Zhen Bai, T. Perciano
Semantic segmentation algorithms based on deep learning architectures have been applied to a diverse set of problems. Consequently, new methodologies have emerged to push the state-of-the-art in this field forward, and the need for powerful user-friendly software increased significantly. The combination of Conditional Random Fields (CRFs) and Convolutional Neural Networks (CNNs) boosted the results of pixel-level classification predictions. Recent work using a fully integrated CRF-RNN layer have shown strong advantages in segmentation benchmarks over the base models. Despite this success, the rigidity of these frameworks prevents mass adaptability for complex scientific datasets and presents challenges in optimally scaling these models. In this work, we introduce a new encoder-decoder system that overcomes both these issues. We adapt multiple CNNs as encoders, allowing for the definition of multiple function parameter arguments to structure the models according to the targeted datasets and scientific problem. We leverage the flexibility of the U-Net architecture to act as a scalable decoder. The CRF-RNN layer is integrated into the decoder as an optional final layer, keeping the entire system fully compatible with back-propagation. To evaluate the performance of our implementation, we performed experiments on the Oxford-IIIT Pet Dataset and to experimental scientific data acquired via micro-computed tomography (microCT), revealing the adaptability of this framework and the performance benefits from a fully end-to-end CNN-CRF system on a both experimental and benchmark datasets.
{"title":"Adaptable Deep Learning and Probabilistic Graphical Model System for Semantic Segmentation","authors":"Matthew Avaylon, R. Sadre, Zhen Bai, T. Perciano","doi":"10.54364/aaiml.2022.1119","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1119","url":null,"abstract":"Semantic segmentation algorithms based on deep learning architectures have been applied to a diverse set of problems. Consequently, new methodologies have emerged to push the state-of-the-art in this field forward, and the need for powerful user-friendly software increased significantly. The combination of Conditional Random Fields (CRFs) and Convolutional Neural Networks (CNNs) boosted the results of pixel-level classification predictions. Recent work using a fully integrated CRF-RNN layer have shown strong advantages in segmentation benchmarks over the base models. Despite this success, the rigidity of these frameworks prevents mass adaptability for complex scientific datasets and presents challenges in optimally scaling these models. In this work, we introduce a new encoder-decoder system that overcomes both these issues. We adapt multiple CNNs as encoders, allowing for the definition of multiple function parameter arguments to structure the models according to the targeted datasets and scientific problem. We leverage the flexibility of the U-Net architecture to act as a scalable decoder. The CRF-RNN layer is integrated into the decoder as an optional final layer, keeping the entire system fully compatible with back-propagation. To evaluate the performance of our implementation, we performed experiments on the Oxford-IIIT Pet Dataset and to experimental scientific data acquired via micro-computed tomography (microCT), revealing the adaptability of this framework and the performance benefits from a fully end-to-end CNN-CRF system on a both experimental and benchmark datasets.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123661097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2023.1146
Philipp Hofer, Michael Roland, R. Mayrhofer, Philipp Schwarz
Biometrics are one of the most privacy-sensitive data. Ubiquitous authentication systems with a focus on privacy favor decentralized approaches as they reduce potential attack vectors, both on a technical and organizational level. The gold standard is to let the user be in control of where their own data is stored, which consequently leads to a high variety of devices used. Moreover, in comparison with a centralized system, designs with higher end-user freedom often incur additional network overhead. Therefore, when using face recognition for biometric authentication, an efficient way to compare faces is important in practical deployments, because it reduces both network and hardware requirements that are essential to encourage device diversity. This paper proposes an efficient way to aggregate embeddings used for face recognition based on an extensive analysis on different datasets and the use of different aggregation strategies. As part of this analysis, a new dataset has been collected, which is available for research purposes. Our proposed method supports the construction of massively scalable, decentralized face recognition systems with a focus on both privacy and long-term usability.
{"title":"Optimizing Distributed Face Recognition Systems through Efficient Aggregation of Facial Embeddings","authors":"Philipp Hofer, Michael Roland, R. Mayrhofer, Philipp Schwarz","doi":"10.54364/aaiml.2023.1146","DOIUrl":"https://doi.org/10.54364/aaiml.2023.1146","url":null,"abstract":"Biometrics are one of the most privacy-sensitive data. Ubiquitous authentication systems with a focus on privacy favor decentralized approaches as they reduce potential attack vectors, both on a technical and organizational level. The gold standard is to let the user be in control of where their own data is stored, which consequently leads to a high variety of devices used. Moreover, in comparison with a centralized system, designs with higher end-user freedom often incur additional network overhead. Therefore, when using face recognition for biometric authentication, an efficient way to compare faces is important in practical deployments, because it reduces both network and hardware requirements that are essential to encourage device diversity. This paper proposes an efficient way to aggregate embeddings used for face recognition based on an extensive analysis on different datasets and the use of different aggregation strategies. As part of this analysis, a new dataset has been collected, which is available for research purposes. Our proposed method supports the construction of massively scalable, decentralized face recognition systems with a focus on both privacy and long-term usability.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115287726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2022.1120
W. Zhang, Ilya Reznik
For companies that serve corporate customers, Customer Service Outage (CSO) is a catastrophic event that may lead to some loss of their customer data. After each CSO, it is important to have a timely and quantitative measurement of how much data was lost. However, it is impractical for human to do so due to the enormous amount of data. In this paper, we present a robust solution that can return numerical loss report within hours. It handles a variety of challenges that are associated with the data. Consequently, management team can gauge the severity of data loss right after each event and respond accordingly.
{"title":"Estimating Data Loss At Scale","authors":"W. Zhang, Ilya Reznik","doi":"10.54364/aaiml.2022.1120","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1120","url":null,"abstract":"For companies that serve corporate customers, Customer Service Outage (CSO) is a catastrophic event that may lead to some loss of their customer data. After each CSO, it is important to have a timely and quantitative measurement of how much data was lost. However, it is impractical for human to do so due to the enormous amount of data. In this paper, we present a robust solution that can return numerical loss report within hours. It handles a variety of challenges that are associated with the data. Consequently, management team can gauge the severity of data loss right after each event and respond accordingly.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123389677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2023.1158
Steven Moonen
Computer vision systems become more wide spread in the manufacturing industry for automating tasks. As these vision systems use more and more machine learning opposed to the classic vision algorithms, streamlining the process of creating the training datasets become more important. Creating large labeled datasets is a tedious and time consuming process that makes it expensive. Especially in a low-volume high-variance manufacturing environment. To reduce the costs of creating training datasets we introduce CAD2Render, a GPUaccelerated synthetic data generator based on the Unity High Definition Render Pipeline
{"title":"CAD2Render: A Synthetic Data Generator for Training Object Detection and Pose Estimation Models in Industrial Environments","authors":"Steven Moonen","doi":"10.54364/aaiml.2023.1158","DOIUrl":"https://doi.org/10.54364/aaiml.2023.1158","url":null,"abstract":"Computer vision systems become more wide spread in the manufacturing industry for automating tasks. As these vision systems use more and more machine learning opposed to the classic vision algorithms, streamlining the process of creating the training datasets become more important. Creating large labeled datasets is a tedious and time consuming process that makes it expensive. Especially in a low-volume high-variance manufacturing environment. To reduce the costs of creating training datasets we introduce CAD2Render, a GPUaccelerated synthetic data generator based on the Unity High Definition Render Pipeline","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124046029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2023.1166
Xuanyue Yang, Wenting Ye, Luke Breitfeller, Tianwei Yue, Wenping Wang
The field of coreference resolution has witnessed significant advancements since the introduction of deep learning-based models. In this paper, we replicate the state-of-the-art coreference resolution model and perform a thorough error analysis. We identify a potential limitation of the current approach in terms of its treatment of grammatical constructions within sentences. Furthermore, the model struggles to leverage contextual information across sentences, resulting in suboptimal accuracy when resolving mentions that span multiple sentences. Motivated by these observations, we propose an approach that integrates linguistic information throughout the entire architecture. Our innovative contributions include multitask learning with part-of-speech (POS) tagging, supervision of intermediate scores, and self-attention mechanisms that operate across sentences. By incorporating these linguisticinspired modules, we not only achieve a modest improvement in the F1 score on CoNLL 2012 dataset, but we also perform qualitative analysis to ascertain whether our model invisibly surpasses the baseline performance. Our findings demonstrate that our model successfully learns linguistic signals that are absent in the original baseline. We posit that these enhance ments may have gone undetected due to annotation errors, but they nonetheless lead to a more accurate understanding of coreference resolution.
{"title":"Linguistically-Inspired Neural Coreference Resolution","authors":"Xuanyue Yang, Wenting Ye, Luke Breitfeller, Tianwei Yue, Wenping Wang","doi":"10.54364/aaiml.2023.1166","DOIUrl":"https://doi.org/10.54364/aaiml.2023.1166","url":null,"abstract":"The field of coreference resolution has witnessed significant advancements since the introduction of deep learning-based models. In this paper, we replicate the state-of-the-art coreference resolution model and perform a thorough error analysis. We identify a potential limitation of the current approach in terms of its treatment of grammatical constructions within sentences. Furthermore, the model struggles to leverage contextual information across sentences, resulting in suboptimal accuracy when resolving mentions that span multiple sentences. Motivated by these observations, we propose an approach that integrates linguistic information throughout the entire architecture. Our innovative contributions include multitask learning with part-of-speech (POS) tagging, supervision of intermediate scores, and self-attention mechanisms that operate across sentences. By incorporating these linguisticinspired modules, we not only achieve a modest improvement in the F1 score on CoNLL 2012 dataset, but we also perform qualitative analysis to ascertain whether our model invisibly surpasses the baseline performance. Our findings demonstrate that our model successfully learns linguistic signals that are absent in the original baseline. We posit that these enhance ments may have gone undetected due to annotation errors, but they nonetheless lead to a more accurate understanding of coreference resolution.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127106866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2022.1129
Mrouj Almuhajri, C. Suen
The rapid advancements in artificial intelligence algorithms have sharpened the focus on street signs due to their prevalence. Some street signs have consistent shapes and pre-defined colors and fonts, such as traffic signs while others are characterized by their visual variability like shop signboards. This variations create a complicated challenge for AI-based systems to classify them. In this paper, the annotation of the ShoS dataset were extended to include more attributes for shop classification. Then, two classifiers were trained and tested utilizing the extended ShoS dataset. SVM showed great performance as its F1-score reached 89.33%. The classification performance was compared with human performance, and the results showed that our classifier excelled over human performance by about 15%. The results were discussed, so the factors that affect classification were provided for further enhancement.
{"title":"AI Based Approach for Shop Classification and a Comparative Study with Human","authors":"Mrouj Almuhajri, C. Suen","doi":"10.54364/aaiml.2022.1129","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1129","url":null,"abstract":"The rapid advancements in artificial intelligence algorithms have sharpened the focus on street signs due to their prevalence. Some street signs have consistent shapes and pre-defined colors and fonts, such as traffic signs while others are characterized by their visual variability like shop signboards. This variations create a complicated challenge for AI-based systems to classify them. In this paper, the annotation of the ShoS dataset were extended to include more attributes for shop classification. Then, two classifiers were trained and tested utilizing the extended ShoS dataset. SVM showed great performance as its F1-score reached 89.33%. The classification performance was compared with human performance, and the results showed that our classifier excelled over human performance by about 15%. The results were discussed, so the factors that affect classification were provided for further enhancement.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122714947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2022.1131
Gautam Srivastava
Fifth generation (5G) communication provides high-speed data transfer and low latency, serving the better for multiple heterogeneous applications over previous technologies. This has led to the development of heterogeneous networks (HetNets) by integrating diverse information and communication technologies (ICTs) into a single view to provide better quality of service (QoS) for different classes of users [1]. HetNets are poised to see more widespread acceptance of a means of communication favored in 5G and beyond as we look to 6G as well. The fundamental components of a 5G HetNet scenario include the user equipment (UE) and enhanced node B (eNB). The communications between the UEs is facilitated using eNBs that act as a gateway for inward and outward data exchange. User equipment is classified under pico or macro cells in a HetNet whereas the integration of 5G supports distributed communication modes through device-to-device (D2D) features. Along with the support of eNBs, interference cancellation, carrier aggregation, massive multi-input multi-output, and coordinated transmissions are facilitated in this network to meet the QoS requirements of different applications and users [2, 3]. The interoperable nature of the heterogeneous platform provides pervasive and anonymous access to resources and UE communications. In such a pervasive access scenario, security becomes a prime concern due to the interruptions in D2D communications [4]. Unauthorized devices or adversaries focus on the exchanged data to inject malicious or falsified content, changing its freshness and reliability. Therefore, authenticationcentric solutions are designed for data security along with integrity checks to ensure transmitted data is delivered at the receiver end [5]. Globally, the data security and privacy of the Internet of Things (IoT) has been a concern to all users. As more and more individuals see themselves conducting their day-to-day livelihood on mobile devices, they also see themselves sharing personal information over open channels. Robust data authentication and efficient key management are assimilated in the heterogeneous communication platform for leveraging the security level of data exchange to preserve user data security and privacy. Key management and hash-based authentication methods are designedwith less complexity to reduce the computational and communication-based overheads, alongwith lower latency to support the design goal of 5G environments. Therefore, the adaptiveness of the authentication method is required to be two-fold, namely user-centric and application-centric, as guided by the service and security provider. We have seenmany different areas fuse to offer strong authenticationmethods in 5G. These tend to include Artificial Intelligence, Machine Learning, Deep Learning, and more recently, blockchain technology [6]. Artificial intelligence techniques tend to
{"title":"The Emergence of Heterogeneous Networks","authors":"Gautam Srivastava","doi":"10.54364/aaiml.2022.1131","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1131","url":null,"abstract":"Fifth generation (5G) communication provides high-speed data transfer and low latency, serving the better for multiple heterogeneous applications over previous technologies. This has led to the development of heterogeneous networks (HetNets) by integrating diverse information and communication technologies (ICTs) into a single view to provide better quality of service (QoS) for different classes of users [1]. HetNets are poised to see more widespread acceptance of a means of communication favored in 5G and beyond as we look to 6G as well. The fundamental components of a 5G HetNet scenario include the user equipment (UE) and enhanced node B (eNB). The communications between the UEs is facilitated using eNBs that act as a gateway for inward and outward data exchange. User equipment is classified under pico or macro cells in a HetNet whereas the integration of 5G supports distributed communication modes through device-to-device (D2D) features. Along with the support of eNBs, interference cancellation, carrier aggregation, massive multi-input multi-output, and coordinated transmissions are facilitated in this network to meet the QoS requirements of different applications and users [2, 3]. The interoperable nature of the heterogeneous platform provides pervasive and anonymous access to resources and UE communications. In such a pervasive access scenario, security becomes a prime concern due to the interruptions in D2D communications [4]. Unauthorized devices or adversaries focus on the exchanged data to inject malicious or falsified content, changing its freshness and reliability. Therefore, authenticationcentric solutions are designed for data security along with integrity checks to ensure transmitted data is delivered at the receiver end [5]. Globally, the data security and privacy of the Internet of Things (IoT) has been a concern to all users. As more and more individuals see themselves conducting their day-to-day livelihood on mobile devices, they also see themselves sharing personal information over open channels. Robust data authentication and efficient key management are assimilated in the heterogeneous communication platform for leveraging the security level of data exchange to preserve user data security and privacy. Key management and hash-based authentication methods are designedwith less complexity to reduce the computational and communication-based overheads, alongwith lower latency to support the design goal of 5G environments. Therefore, the adaptiveness of the authentication method is required to be two-fold, namely user-centric and application-centric, as guided by the service and security provider. We have seenmany different areas fuse to offer strong authenticationmethods in 5G. These tend to include Artificial Intelligence, Machine Learning, Deep Learning, and more recently, blockchain technology [6]. Artificial intelligence techniques tend to","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124891027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2022.1134
Anika Saxena, Deepesh Chugh, H. Mittal, Mohammad Sajid, Ritu Chauhan, Eiad Yafi, Jian Cao, Mukesh Prasad
A novel feature selection approach is presented in this paper. Sammon’s Stress Function transforms the high dimension data to a lower dimension data set. A data set is divided into small partitions. The features are assigned randomly to these partitions. Using GA with Sammon Error as fitness value, a small, desired number of features are selected from every partition. The combination of the reduced subsets of the features from these partitions is again divided into small partitions. After a certain number of iterating the process, a desired small number of features is obtained. For experimental validation, the proposed method has been tested on 11 standard datasets with three classifiers namely, Decision Tree, MLP and KNN. The classification accuracies obtained by the proposed method is highest on most of the considered datasets against the results reported in literature. Moreover, the proposed method selects comparatively less number of features in comparison to considered methods. The optimistic results obtained from the proposed method justify its strength.
{"title":"A Novel Unsupervised Feature Selection Approach Using Genetic Algorithm on Partitioned Data","authors":"Anika Saxena, Deepesh Chugh, H. Mittal, Mohammad Sajid, Ritu Chauhan, Eiad Yafi, Jian Cao, Mukesh Prasad","doi":"10.54364/aaiml.2022.1134","DOIUrl":"https://doi.org/10.54364/aaiml.2022.1134","url":null,"abstract":"A novel feature selection approach is presented in this paper. Sammon’s Stress Function transforms the high dimension data to a lower dimension data set. A data set is divided into small partitions. The features are assigned randomly to these partitions. Using GA with Sammon Error as fitness value, a small, desired number of features are selected from every partition. The combination of the reduced subsets of the features from these partitions is again divided into small partitions. After a certain number of iterating the process, a desired small number of features is obtained. For experimental validation, the proposed method has been tested on 11 standard datasets with three classifiers namely, Decision Tree, MLP and KNN. The classification accuracies obtained by the proposed method is highest on most of the considered datasets against the results reported in literature. Moreover, the proposed method selects comparatively less number of features in comparison to considered methods. The optimistic results obtained from the proposed method justify its strength.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123518019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.54364/aaiml.2023.1162
Omer Abdulhaleem Naser, S. M. S. Ahmad, K. Samsudin, M. Hanafi
Facial recognition systems often struggle with detecting faces in poses that deviate from the frontal view. Therefore, this paper investigates the impact of variations in yaw poses on the accuracy of facial recognition systems and presents a robust approach optimized to detect faces with pose variations ranging from 0◦ to ±90◦ . The proposed system integrates MTCNN, FaceNet, and SVC, and is trained and evaluated on the Taiwan dataset, which includes face images with diverse yaw poses. The training dataset consists of 89 subjects, with approximately 70 images per subject, and the testing dataset consists of 49 subjects, each with approximately 5 images. Our system achieved a training accuracy of 99.174% and a test accuracy of 96.970%, demonstrating its efficiency in detecting faces with pose variations. These findings suggest that the proposed approach can be a valuable tool in improving facial recognition accuracy in real-world scenarios.
{"title":"Investigating the Impact of Yaw Pose Variation on Facial Recognition Performance","authors":"Omer Abdulhaleem Naser, S. M. S. Ahmad, K. Samsudin, M. Hanafi","doi":"10.54364/aaiml.2023.1162","DOIUrl":"https://doi.org/10.54364/aaiml.2023.1162","url":null,"abstract":"Facial recognition systems often struggle with detecting faces in poses that deviate from the frontal view. Therefore, this paper investigates the impact of variations in yaw poses on the accuracy of facial recognition systems and presents a robust approach optimized to detect faces with pose variations ranging from 0◦ to ±90◦ . The proposed system integrates MTCNN, FaceNet, and SVC, and is trained and evaluated on the Taiwan dataset, which includes face images with diverse yaw poses. The training dataset consists of 89 subjects, with approximately 70 images per subject, and the testing dataset consists of 49 subjects, each with approximately 5 images. Our system achieved a training accuracy of 99.174% and a test accuracy of 96.970%, demonstrating its efficiency in detecting faces with pose variations. These findings suggest that the proposed approach can be a valuable tool in improving facial recognition accuracy in real-world scenarios.","PeriodicalId":373878,"journal":{"name":"Adv. Artif. Intell. Mach. Learn.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129899423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}