A Human Brain may translate a person's voice to its corresponding face image even if never seen before. Training adeep learning network to do the same can be used in detecting human faces based on their voice, which may be used in findinga criminal that we only have a voice recording for. The goal in this paper is to build a Conditional Generative Adversarial Network that produces face images from human speeches which can then be recognized by a face recognition model to identifythe owner of the speech. The model was trained, and the face recognition model gave an accuracy of 80.08% in training and 56.2% in testing. Compared to the basic GAN model, this model has improved the results by about 30%. Key Word: Face image synthesis, Generative adversarial network, Face Recognition
人脑可以将一个人的声音转化为相应的人脸图像,即使是从未见过的人。对深度学习网络进行同样的训练,可用于根据人的声音检测人脸,从而找到我们只有声音记录的罪犯。本文的目标是建立一个条件生成对抗网络,该网络可以从人类讲话中生成人脸图像,然后通过人脸识别模型识别出讲话的主人。模型经过训练后,人脸识别模型的训练准确率为 80.08%,测试准确率为 56.2%。与基本的 GAN 模型相比,该模型的结果提高了约 30%。关键字人脸图像合成 生成式对抗网络 人脸识别
{"title":"ML-Driven Facial Synthesis from Spoken Words Using Conditional GANs","authors":"Vaishnavi Srivastava, Sakshi Srivastava, Sakshi Chauhan, Divyakshi Yadav","doi":"10.59256/ijire.20240501004","DOIUrl":"https://doi.org/10.59256/ijire.20240501004","url":null,"abstract":"A Human Brain may translate a person's voice to its corresponding face image even if never seen before. Training adeep learning network to do the same can be used in detecting human faces based on their voice, which may be used in findinga criminal that we only have a voice recording for. The goal in this paper is to build a Conditional Generative Adversarial Network that produces face images from human speeches which can then be recognized by a face recognition model to identifythe owner of the speech. The model was trained, and the face recognition model gave an accuracy of 80.08% in training and 56.2% in testing. Compared to the basic GAN model, this model has improved the results by about 30%. Key Word: Face image synthesis, Generative adversarial network, Face Recognition","PeriodicalId":516932,"journal":{"name":"International Journal of Innovative Research in Engineering","volume":"78 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140495923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-24DOI: 10.59256/ijire.20240501003
Yoheswari S, Adhithyaram L, Gokulesh S, Harish Raj K.B, Jivithesh Harshaa R D
The BLYNK RFID AND RETINAL LOCKACCESS SYSTEM describes a digital door lock system that uses an ESP32-CAM module, which is a budget friendly development board with a very small size camera and a micro-SD card slot. The system uses retinal recognition technology to detect the retinal of the person who wants to access the door. The AI-Thinker ESP32-CAM module takes pictures of the person and sends them to the owner via the BLYNK application installed on their mobile phone. The owner can then grant permission to access the door based on the person’s identity. When deploying your BLYNK RFID and retinal scanner project, it's important to consider scalability and maintenance. As your user base and access requirements may change over time, plan for future expansion and updates. Regularly review and update your system's firmware, libraries, and security measures to stay ahead of potential vulnerabilities and evolving best practices in access control. Monitoring and auditing your system's usage is crucial. The Blynk platform can help you gather data on access attempts and system performance, allowing you to analyze the data for any anomalies and potential security breaches. This data can be valuable for compliance, troubleshooting, and performance optimization. Key Word: retinal and RFID scanning for lock to authentic users, using an ESP32-CAM and RFID reader controlling through BLYNK.
BLYNK RFID 和视网膜门锁系统介绍了一种使用 ESP32-CAM 模块的数字门锁系统,ESP32-CAM 模块是一种经济实惠的开发板,带有一个非常小的摄像头和一个微型 SD 卡插槽。该系统使用视网膜识别技术来检测想要开门的人的视网膜。AI-Thinker ESP32-CAM 模块会拍摄该人的照片,并通过安装在手机上的 BLYNK 应用程序发送给主人。然后,主人就可以根据该人的身份授予开门权限。在部署 BLYNK RFID 和视网膜扫描仪项目时,必须考虑可扩展性和维护问题。随着时间的推移,您的用户群和访问要求可能会发生变化,因此要为未来的扩展和更新做好计划。定期检查和更新系统的固件、库和安全措施,以便及时发现潜在的漏洞和不断发展的门禁控制最佳实践。监控和审计系统的使用情况至关重要。Blynk 平台可帮助您收集有关访问尝试和系统性能的数据,使您能够对数据进行分析,查找任何异常情况和潜在的安全漏洞。这些数据对于合规性、故障排除和性能优化都很有价值。关键词:视网膜和 RFID 扫描,通过 BLYNK 控制的 ESP32-CAM 和 RFID 阅读器锁定真实用户。
{"title":"BLYNK RFID and Retinal Lock Access System","authors":"Yoheswari S, Adhithyaram L, Gokulesh S, Harish Raj K.B, Jivithesh Harshaa R D","doi":"10.59256/ijire.20240501003","DOIUrl":"https://doi.org/10.59256/ijire.20240501003","url":null,"abstract":"The BLYNK RFID AND RETINAL LOCKACCESS SYSTEM describes a digital door lock system that uses an ESP32-CAM module, which is a budget friendly development board with a very small size camera and a micro-SD card slot. The system uses retinal recognition technology to detect the retinal of the person who wants to access the door. The AI-Thinker ESP32-CAM module takes pictures of the person and sends them to the owner via the BLYNK application installed on their mobile phone. The owner can then grant permission to access the door based on the person’s identity. When deploying your BLYNK RFID and retinal scanner project, it's important to consider scalability and maintenance. As your user base and access requirements may change over time, plan for future expansion and updates. Regularly review and update your system's firmware, libraries, and security measures to stay ahead of potential vulnerabilities and evolving best practices in access control. Monitoring and auditing your system's usage is crucial. The Blynk platform can help you gather data on access attempts and system performance, allowing you to analyze the data for any anomalies and potential security breaches. This data can be valuable for compliance, troubleshooting, and performance optimization. Key Word: retinal and RFID scanning for lock to authentic users, using an ESP32-CAM and RFID reader controlling through BLYNK.","PeriodicalId":516932,"journal":{"name":"International Journal of Innovative Research in Engineering","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140497224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Custom Voice Cloner is based on voice signal speech synthesizer. It is a technology that converts text into audible speech, simulating human speech characteristics like pitch and tone. It finds applications in virtual assistants, navigation systems, and accessibility tools. Building one in Python typically involves Text-to-Speech (TTS) libraries such as gTTS, pyttsx3, or platform-specific options for Windows and macOS, offering easy text-to-speech conversion.However, TTS libraries might lack customization and voice quality needed for advanced projects. For more sophisticated applications, custom voice synthesizers can be built using deep learning techniques like Tacotron and WaveNet. These models learn speech nuances for more natural output.Creating a custom voice synthesizer is challenging, requiring high-quality training data, machine learning expertise, and substantial computational resources. It goes beyond generating speech to convey emotions and nuances in pronunciation for natural and expressive voices. Key Word: Voice signal speech synthesizer,text-to-speech conversion, deep learning,TTS, gTTS, pyttsx3,etc.
{"title":"Custom Voice Cloner","authors":"Usharani K, Nandha kumaran H, Nikhilesh Pranav M.S, Nithish kumar K.K, Prasanna Krishna A.S","doi":"10.59256/ijire.20240501002","DOIUrl":"https://doi.org/10.59256/ijire.20240501002","url":null,"abstract":"The Custom Voice Cloner is based on voice signal speech synthesizer. It is a technology that converts text into audible speech, simulating human speech characteristics like pitch and tone. It finds applications in virtual assistants, navigation systems, and accessibility tools. Building one in Python typically involves Text-to-Speech (TTS) libraries such as gTTS, pyttsx3, or platform-specific options for Windows and macOS, offering easy text-to-speech conversion.However, TTS libraries might lack customization and voice quality needed for advanced projects. For more sophisticated applications, custom voice synthesizers can be built using deep learning techniques like Tacotron and WaveNet. These models learn speech nuances for more natural output.Creating a custom voice synthesizer is challenging, requiring high-quality training data, machine learning expertise, and substantial computational resources. It goes beyond generating speech to convey emotions and nuances in pronunciation for natural and expressive voices. Key Word: Voice signal speech synthesizer,text-to-speech conversion, deep learning,TTS, gTTS, pyttsx3,etc.","PeriodicalId":516932,"journal":{"name":"International Journal of Innovative Research in Engineering","volume":"427 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140502508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-06DOI: 10.59256/ijire.20240501001
Maria Sobana S, R. M, Rajkumar R K, Rajkumar M, Siddarthan S
The voice assistance is an software which is able to provide a detailed response as a voice based output according to an instruction in a prompt. To seamless integration of quick responses to queries and up-to-date weather information enhances daily routines, promoting efficiency and convenience. To achieve these capabilities, technologies like NLTK, pyttsx3, and speech recognition libraries play a pivotal role. To summarize, the convergence of these tools is gradually transforming the futuristic concept of an indispensable personal assistant into an attainable reality. AI technologies have revolutionized digital assistant interactions, but as they integrate into daily life, addressing bias, ambiguity, and ethics becomes crucial. Key Word: Integration; Convergence; Futuristic; Indispensable;
{"title":"Embedding Artificial Intelligence for Personal Voice Assistant Using NLP","authors":"Maria Sobana S, R. M, Rajkumar R K, Rajkumar M, Siddarthan S","doi":"10.59256/ijire.20240501001","DOIUrl":"https://doi.org/10.59256/ijire.20240501001","url":null,"abstract":"The voice assistance is an software which is able to provide a detailed response as a voice based output according to an instruction in a prompt. To seamless integration of quick responses to queries and up-to-date weather information enhances daily routines, promoting efficiency and convenience. To achieve these capabilities, technologies like NLTK, pyttsx3, and speech recognition libraries play a pivotal role. To summarize, the convergence of these tools is gradually transforming the futuristic concept of an indispensable personal assistant into an attainable reality. AI technologies have revolutionized digital assistant interactions, but as they integrate into daily life, addressing bias, ambiguity, and ethics becomes crucial. Key Word: Integration; Convergence; Futuristic; Indispensable;","PeriodicalId":516932,"journal":{"name":"International Journal of Innovative Research in Engineering","volume":"17 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140513048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}