首页 > 最新文献

24th Irish Machine Vision and Image Processing Conference最新文献

英文 中文
An NLP approach to Image Analysis 图像分析的NLP方法
Pub Date : 2022-08-31 DOI: 10.56541/kfbi5107
G. Martínez
In Natural Language Processing, measuring word frequency combined with word distribution can yield a precise indicator of lexical relevance, a measure of great value in the context of Information Retrieval. Such detection of keywords exploits the structural properties of text as revealed notably by Zipf’s Law which describes frequency distribution as a ‘long tailed’ phenomenon. Can such properties be found in images? If so, can they serve to distinguish high content items (particular colours coded as RGBs) from low information items? To explore this possibility, we have applied NLP algorithms to a corpus of satellite images in order to extract a number of linguistic-type features in bitmaps so as to augment the original corpus with distributional information regarding its RGBs and observe if this addition improves accuracy throughout a Machine Learning pipeline tested with several Transfer Learning models.
在自然语言处理中,测量词频与词分布相结合可以得到一个精确的词汇相关性指标,这在信息检索中具有重要的价值。这种关键字检测利用了文本的结构特性,Zipf定律将频率分布描述为“长尾”现象。这些属性可以在图像中找到吗?如果是这样,它们是否可以用来区分高含量的项目(编码为rgb的特定颜色)和低信息的项目?为了探索这种可能性,我们将NLP算法应用于卫星图像的语料库,以便提取位图中的许多语言类型特征,以便用有关其rgb的分布信息增强原始语料库,并观察这种添加是否提高了使用几个迁移学习模型测试的整个机器学习管道的准确性。
{"title":"An NLP approach to Image Analysis","authors":"G. Martínez","doi":"10.56541/kfbi5107","DOIUrl":"https://doi.org/10.56541/kfbi5107","url":null,"abstract":"In Natural Language Processing, measuring word frequency combined with word distribution can yield a precise indicator of lexical relevance, a measure of great value in the context of Information Retrieval. Such detection of keywords exploits the structural properties of text as revealed notably by Zipf’s Law which describes frequency distribution as a ‘long tailed’ phenomenon. Can such properties be found in images? If so, can they serve to distinguish high content items (particular colours coded as RGBs) from low information items? To explore this possibility, we have applied NLP algorithms to a corpus of satellite images in order to extract a number of linguistic-type features in bitmaps so as to augment the original corpus with distributional information regarding its RGBs and observe if this addition improves accuracy throughout a Machine Learning pipeline tested with several Transfer Learning models.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114978554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reality Analagous Synthetic Dataset Generation with Daylight Variance for Deep Learning Classification 基于日光方差的深度学习分类现实模拟合成数据集生成
Pub Date : 2022-08-31 DOI: 10.56541/poya9239
Thomas Lee, Susan Mckeever, J. Courtney
For the implementation of Autonomously navigating Unmanned Air Vehicles (UAV) in the real world, it must be shown that safe navigation is possible in all real-world scenarios. In the case of UAVs powered by Deep Learning algorithms, this is a difficult task to achieve, as the weak point of any trained network is the reduction in predictive capacity when presented with unfamiliar input data. It is possible to train for more use cases, however more data is required for this, requiring time and manpower to acquire. In this work, a potential solution to the manpower issues of exponentially scaling dataset size and complexity is presented, through the generation of artificial image datasets that are based off of a 3D scanned recreation of a physical space and populated with 3D scanned objects of a specific class. This simulation is then used to generate image samples that iterates temporally resulting in a slice-able dataset that contains time varied components of the same class.
为了在现实世界中实现自主导航的无人机(UAV),必须证明在所有现实场景中安全导航是可能的。对于由深度学习算法驱动的无人机来说,这是一项很难实现的任务,因为任何训练过的网络的弱点都是在面对不熟悉的输入数据时预测能力的降低。为更多的用例进行培训是可能的,然而这需要更多的数据,需要时间和人力来获取。在这项工作中,通过生成基于物理空间的3D扫描重建并填充特定类别的3D扫描对象的人工图像数据集,提出了一个潜在的解决指数级扩展数据集大小和复杂性的人力问题的解决方案。然后使用这个模拟来生成图像样本,这些样本会在时间上迭代,从而产生一个可切片的数据集,该数据集包含同一类的随时间变化的组件。
{"title":"Reality Analagous Synthetic Dataset Generation with Daylight Variance for Deep Learning Classification","authors":"Thomas Lee, Susan Mckeever, J. Courtney","doi":"10.56541/poya9239","DOIUrl":"https://doi.org/10.56541/poya9239","url":null,"abstract":"For the implementation of Autonomously navigating Unmanned Air Vehicles (UAV) in the real world, it must be shown that safe navigation is possible in all real-world scenarios. In the case of UAVs powered by Deep Learning algorithms, this is a difficult task to achieve, as the weak point of any trained network is the reduction in predictive capacity when presented with unfamiliar input data. It is possible to train for more use cases, however more data is required for this, requiring time and manpower to acquire. In this work, a potential solution to the manpower issues of exponentially scaling dataset size and complexity is presented, through the generation of artificial image datasets that are based off of a 3D scanned recreation of a physical space and populated with 3D scanned objects of a specific class. This simulation is then used to generate image samples that iterates temporally resulting in a slice-able dataset that contains time varied components of the same class.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124829159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distance measurement between smartphones within an ad-hoc camera array, using audible PRBS 智能手机之间的距离测量在特设的相机阵列,使用听觉PRBS
Pub Date : 2022-08-31 DOI: 10.56541/kebp4512
Pádraic McEvoy, D. Berry, Ted Burke
An approach for measuring the distance between two smartphones is presented in this paper. The method uses each smartphone’s microphone(s) and speaker(s) to concurrently emit and record audio in order to calculate the sound propagation delay and hence distance. Each device in turn emits a different audible pseudo-random binary sequence (PRBS) - specifically, a maximum length sequence (MLS). Each device captures both emitted signals in one continuous recording. The propagation delay between the devices is calculated by comparing their respective recordings, and in particular the temporal positions of the emitted signals within each recording. Each device emits one of the signals, records both signals, and then sends its recording to a master device for analysis, which is performed by a custom web application and is therefore independent of operating system. A mean error of 32.29 mm was found in initial testing, which was conducted using Samsung Galaxy A10 devices running Android 10. The key innovation in this method is that it requires no clock time synchronisation between devices because the distance is determined by comparing inter-transmission delays in the two recordings. Potential future improvements are discussed, including how to take into account the exact locations of each phone’s microphone and speaker to increase accuracy.
本文提出了一种测量两个智能手机之间距离的方法。该方法利用每个智能手机的麦克风和扬声器同时发射和记录音频,从而计算声音传播延迟和距离。每个设备依次发出不同的可听伪随机二进制序列(PRBS) -具体来说,是最大长度序列(MLS)。每个设备在一个连续记录中捕获两个发射信号。设备之间的传播延迟通过比较它们各自的记录来计算,特别是在每个记录中发射信号的时间位置。每个设备发出一个信号,记录两个信号,然后将其记录发送到主设备进行分析,该分析由自定义web应用程序执行,因此独立于操作系统。首次测试的平均误差为32.29毫米,测试对象是运行Android 10系统的三星Galaxy A10手机。这种方法的关键创新之处在于,它不需要设备之间的时钟时间同步,因为距离是通过比较两个记录中的传输延迟来确定的。讨论了未来可能的改进,包括如何考虑每部手机麦克风和扬声器的确切位置以提高准确性。
{"title":"Distance measurement between smartphones within an ad-hoc camera array, using audible PRBS","authors":"Pádraic McEvoy, D. Berry, Ted Burke","doi":"10.56541/kebp4512","DOIUrl":"https://doi.org/10.56541/kebp4512","url":null,"abstract":"An approach for measuring the distance between two smartphones is presented in this paper. The method uses each smartphone’s microphone(s) and speaker(s) to concurrently emit and record audio in order to calculate the sound propagation delay and hence distance. Each device in turn emits a different audible pseudo-random binary sequence (PRBS) - specifically, a maximum length sequence (MLS). Each device captures both emitted signals in one continuous recording. The propagation delay between the devices is calculated by comparing their respective recordings, and in particular the temporal positions of the emitted signals within each recording. Each device emits one of the signals, records both signals, and then sends its recording to a master device for analysis, which is performed by a custom web application and is therefore independent of operating system. A mean error of 32.29 mm was found in initial testing, which was conducted using Samsung Galaxy A10 devices running Android 10. The key innovation in this method is that it requires no clock time synchronisation between devices because the distance is determined by comparing inter-transmission delays in the two recordings. Potential future improvements are discussed, including how to take into account the exact locations of each phone’s microphone and speaker to increase accuracy.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125104420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparison of Feature Extraction Methods Applied to Thermal Sensor Binary Image Data to Classify Bed Occupancy 热传感器二值图像数据特征提取方法在床占率分类中的比较
Pub Date : 2022-08-31 DOI: 10.56541/qlzv1440
Rebecca Hand, I. Cleland, C. Nugent
Low-resolution thermal sensing technology is suitable for sleep monitoring due to being light invariant and privacy preserving. Feature extraction is a critical step in facilitating robust detection and tracking, therefore this paper compares a blob analysis approach of extracting statistical features to several common feature descriptor algorithm approaches (SURF and KAZE). The features are extracted from thermal binary image data for the purpose of detecting bed occupancy. Four common machine learning models (SVM, KNN, DT and NB) were trained and evaluated using a leave-one-subject-out validation method. The SVM trained with feature descriptor data achieved the highest accuracy of 0.961.
低分辨率热感测技术具有光不变性和保密性等优点,适合用于睡眠监测。特征提取是促进鲁棒检测和跟踪的关键步骤,因此本文将提取统计特征的blob分析方法与几种常用的特征描述符算法(SURF和KAZE)进行了比较。从热二值图像数据中提取特征,用于检测床位占用情况。四种常见的机器学习模型(SVM, KNN, DT和NB)使用留一个主体验证方法进行训练和评估。使用特征描述符数据训练的SVM准确率最高,为0.961。
{"title":"A Comparison of Feature Extraction Methods Applied to Thermal Sensor Binary Image Data to Classify Bed Occupancy","authors":"Rebecca Hand, I. Cleland, C. Nugent","doi":"10.56541/qlzv1440","DOIUrl":"https://doi.org/10.56541/qlzv1440","url":null,"abstract":"Low-resolution thermal sensing technology is suitable for sleep monitoring due to being light invariant and privacy preserving. Feature extraction is a critical step in facilitating robust detection and tracking, therefore this paper compares a blob analysis approach of extracting statistical features to several common feature descriptor algorithm approaches (SURF and KAZE). The features are extracted from thermal binary image data for the purpose of detecting bed occupancy. Four common machine learning models (SVM, KNN, DT and NB) were trained and evaluated using a leave-one-subject-out validation method. The SVM trained with feature descriptor data achieved the highest accuracy of 0.961.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133368388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Feasibility of Privacy-Secured Facial Authentication for low-power IoT Devices - Quantifying the Effects of Head Pose Variation on End-to-End Neural Face Recognition 低功耗物联网设备隐私保护面部认证的可行性研究——量化头部姿势变化对端到端神经人脸识别的影响
Pub Date : 2022-08-31 DOI: 10.56541/fevr2516
Wang Yao, Viktor Varkarakis, Joseph Lemley, P. Corcoran
Recent low-power neural accelerator hardware provides a solution for end-to-end privacy and secure facial authentication, such as smart refueling machine locks in shared accommodation, smart speakers, or televisions that respond only to family members. This work explores the impact that head pose variation has on the performance of a state-of-the-art face recognition model. A synthetic technique is employed to introduce head pose variation into data samples. Experiments show that the synthetic pose variations have a similar effect on face recognition performance as the real samples with pose variations. The impact of large variations of head poses on the face recognizer was then explored by further amplifying the angle of the synthetic head pose. It is found that the accuracy of the face recognition model deteriorates as the pose increases. After fine-tuning the network, the face recognition model achieves close to the accuracy of frontal faces in all pose variations, indicating that the face recognition model can be tuned to compensate for the effect of large poses.
最近的低功耗神经加速器硬件为端到端隐私和安全面部认证提供了解决方案,例如共享住宿中的智能加油机锁、智能扬声器或只响应家庭成员的电视。这项工作探讨了头部姿势变化对最先进的人脸识别模型的性能的影响。采用综合技术将头部姿态变化引入数据样本中。实验表明,合成的姿态变化对人脸识别性能的影响与具有姿态变化的真实样本相似。然后通过进一步放大合成头部姿势的角度来探索头部姿势的大变化对人脸识别器的影响。研究发现,人脸识别模型的准确率随着姿态的增加而下降。在对网络进行微调后,人脸识别模型在所有姿态变化下都接近正面人脸的精度,表明人脸识别模型可以调整以补偿大姿态的影响。
{"title":"On the Feasibility of Privacy-Secured Facial Authentication for low-power IoT Devices - Quantifying the Effects of Head Pose Variation on End-to-End Neural Face Recognition","authors":"Wang Yao, Viktor Varkarakis, Joseph Lemley, P. Corcoran","doi":"10.56541/fevr2516","DOIUrl":"https://doi.org/10.56541/fevr2516","url":null,"abstract":"Recent low-power neural accelerator hardware provides a solution for end-to-end privacy and secure facial authentication, such as smart refueling machine locks in shared accommodation, smart speakers, or televisions that respond only to family members. This work explores the impact that head pose variation has on the performance of a state-of-the-art face recognition model. A synthetic technique is employed to introduce head pose variation into data samples. Experiments show that the synthetic pose variations have a similar effect on face recognition performance as the real samples with pose variations. The impact of large variations of head poses on the face recognizer was then explored by further amplifying the angle of the synthetic head pose. It is found that the accuracy of the face recognition model deteriorates as the pose increases. After fine-tuning the network, the face recognition model achieves close to the accuracy of frontal faces in all pose variations, indicating that the face recognition model can be tuned to compensate for the effect of large poses.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125932219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Comparative Study of Traditional Light Field Methods and NeRF 传统光场法与NeRF的比较研究
Pub Date : 2022-08-31 DOI: 10.56541/iqkc6774
Pierre Matysiak, Susana Ruano, Martin Alain, A. Smolic
Neural Radiance Fields (NeRF) is a recent technology which had a large impact in computer vision, promising to generate high quality novel views and corresponding disparity map, all using a fairly small number of input images. In effect, they are a new way to represent a light field. In this paper, we compare NeRF with traditional light field methods for novel view synthesis and depth estimation, in an attempt to quantify the advantages brought by NeRF, and to put these results in perspective with the way both paradigms are used practically. We provide qualitative and quantitative comparisons, discuss them and highlight some aspects of working with NeRF depending on the type of light field data used.
神经辐射场(Neural Radiance Fields, NeRF)是近年来在计算机视觉领域产生重大影响的一项技术,它有望在使用相当少量的输入图像的情况下生成高质量的新视图和相应的视差图。实际上,它们是一种表示光场的新方法。在本文中,我们将NeRF与传统的光场方法在新的视图合成和深度估计方面进行了比较,试图量化NeRF带来的优势,并将这些结果与两种范式的实际使用方式进行比较。我们提供定性和定量的比较,讨论它们,并根据所使用的光场数据类型强调使用NeRF的一些方面。
{"title":"A Comparative Study of Traditional Light Field Methods and NeRF","authors":"Pierre Matysiak, Susana Ruano, Martin Alain, A. Smolic","doi":"10.56541/iqkc6774","DOIUrl":"https://doi.org/10.56541/iqkc6774","url":null,"abstract":"Neural Radiance Fields (NeRF) is a recent technology which had a large impact in computer vision, promising to generate high quality novel views and corresponding disparity map, all using a fairly small number of input images. In effect, they are a new way to represent a light field. In this paper, we compare NeRF with traditional light field methods for novel view synthesis and depth estimation, in an attempt to quantify the advantages brought by NeRF, and to put these results in perspective with the way both paradigms are used practically. We provide qualitative and quantitative comparisons, discuss them and highlight some aspects of working with NeRF depending on the type of light field data used.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127982842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine vision system for avian song classification with CNN’s 基于CNN的鸟类歌曲分类机器视觉系统
Pub Date : 2022-08-31 DOI: 10.56541/mhzn4111
Gabriel R. Palma, Ana Aquino, P. Monticelli, L. Verdade, C. Markham, Rafael Moral
Soundscape ecologists aim to study the acoustic characteristics of an area that reflects natural processes [Schafer, 1977]. These sounds can be interpreted as biological (biophony), geophysical (geophony), and human-produced (anthrophony) [Pijanowski et al., 2011]. A common task is to use sounds to identify species based on the frequency content of a given signal. This signal can be further converted into spectrograms enabling other types of analysis to automate the identification of species. Based on the promising results of deep learning methods, such as Convolution Neural Networks (CNNs) in image classification, here we propose the use of a pre-trained VGG16 CNN architecture to identify two nocturnal avian species, namely Antrostomus rufus and Megascops choliba, commonly encountered in Brazilian forests. Monitoring the abundance of these species is important to ecologists to develop conservation programmes, detect environmental disturbances and assess the impact of human action. Specialists recorded sounds in 16-bit wave files at a sampling rate of 44Hz and classified the presence of these species. With the classified wave files, we created additional classes to visualise the performance of the VGG16 CNN architecture for detecting both species. We end up with six categories containing 60 seconds of audio of species vocalisation combinations and background only sounds. We produced spectrograms using the information from each RGB channel, only one channel (grey-scale), and applied the histogram equalisation technique to the grey-scale images. A comparison of the system performance using histogram equalised images and unmodified images was made. Histogram equalisation improves the contrast, and so the visibility to the human observer. Investigating the effect of histogram equalisation on the performance of the CNN was a feature of this study. Moreover, to show the practical application of our work, we created 51 minutes of audio, which contains more noise than the presence of both species (a scenario commonly encountered in field surveys). Our results showed that the trained VGG16 CNN produced, after 8000 epochs, a training accuracy of 100% for the three approaches. The test accuracy was 80.64%, 75.26%, and 67.74% for the RGB, grey-scaled, and histogram equalised approaches. The method’s accuracy on the synthetic audio file of 51 minutes was 92.15%. This accuracy level reveals the potential of CNN architectures in automating species detection and identification by sound using passive monitoring. Our results suggest that using coloured images to represent the spectrogram better generalises the classification than grey-scale and histogram equalised images. This study might develop future avian monitoring programmes based on passive sound recording, which significantly enhances sampling size without increasing cost.
声景观生态学家旨在研究反映自然过程的区域的声学特征[Schafer, 1977]。这些声音可以被解释为生物(生物音)、地球物理(地质音)和人类产生的(人声)[Pijanowski等人,2011]。一个常见的任务是根据给定信号的频率内容使用声音来识别物种。该信号可以进一步转换为频谱图,使其他类型的分析能够自动识别物种。基于卷积神经网络(convolutional Neural Networks, CNN)等深度学习方法在图像分类方面取得的良好成果,本文提出使用预训练的VGG16 CNN架构来识别巴西森林中常见的两种夜行鸟类,即Antrostomus rufus和Megascops choliba。监测这些物种的丰富程度对生态学家制定保护计划、探测环境干扰和评估人类活动的影响非常重要。专家们以44Hz的采样率将声音记录在16位波文件中,并对这些物种的存在进行分类。通过分类波文件,我们创建了额外的类来可视化VGG16 CNN架构的性能,用于检测两种物种。我们最终得到了六个类别,包含60秒的物种发声组合和背景声音的音频。我们使用来自每个RGB通道的信息生成谱图,只有一个通道(灰度),并将直方图均衡化技术应用于灰度图像。比较了直方图均衡化图像和未修改图像的系统性能。直方图均衡化提高了对比度,从而提高了人类观察者的可见性。研究直方图均衡化对CNN性能的影响是本研究的一个特点。此外,为了展示我们工作的实际应用,我们制作了51分钟的音频,其中包含的噪音比两种物种的存在都要多(这是野外调查中经常遇到的情况)。结果表明,经过8000次epoch的训练后,三种方法的训练准确率均达到100%。RGB、灰度化和直方图均衡化方法的测试准确率分别为80.64%、75.26%和67.74%。该方法对51分钟合成音频文件的准确率为92.15%。这种精度水平揭示了CNN架构在使用被动监测的声音自动化物种检测和识别方面的潜力。我们的研究结果表明,使用彩色图像来表示谱图比灰度和直方图均衡图像更好地概括了分类。该研究可能为未来基于被动录音的鸟类监测计划提供基础,在不增加成本的情况下显著提高采样规模。
{"title":"A machine vision system for avian song classification with CNN’s","authors":"Gabriel R. Palma, Ana Aquino, P. Monticelli, L. Verdade, C. Markham, Rafael Moral","doi":"10.56541/mhzn4111","DOIUrl":"https://doi.org/10.56541/mhzn4111","url":null,"abstract":"Soundscape ecologists aim to study the acoustic characteristics of an area that reflects natural processes [Schafer, 1977]. These sounds can be interpreted as biological (biophony), geophysical (geophony), and human-produced (anthrophony) [Pijanowski et al., 2011]. A common task is to use sounds to identify species based on the frequency content of a given signal. This signal can be further converted into spectrograms enabling other types of analysis to automate the identification of species. Based on the promising results of deep learning methods, such as Convolution Neural Networks (CNNs) in image classification, here we propose the use of a pre-trained VGG16 CNN architecture to identify two nocturnal avian species, namely Antrostomus rufus and Megascops choliba, commonly encountered in Brazilian forests. Monitoring the abundance of these species is important to ecologists to develop conservation programmes, detect environmental disturbances and assess the impact of human action. Specialists recorded sounds in 16-bit wave files at a sampling rate of 44Hz and classified the presence of these species. With the classified wave files, we created additional classes to visualise the performance of the VGG16 CNN architecture for detecting both species. We end up with six categories containing 60 seconds of audio of species vocalisation combinations and background only sounds. We produced spectrograms using the information from each RGB channel, only one channel (grey-scale), and applied the histogram equalisation technique to the grey-scale images. A comparison of the system performance using histogram equalised images and unmodified images was made. Histogram equalisation improves the contrast, and so the visibility to the human observer. Investigating the effect of histogram equalisation on the performance of the CNN was a feature of this study. Moreover, to show the practical application of our work, we created 51 minutes of audio, which contains more noise than the presence of both species (a scenario commonly encountered in field surveys). Our results showed that the trained VGG16 CNN produced, after 8000 epochs, a training accuracy of 100% for the three approaches. The test accuracy was 80.64%, 75.26%, and 67.74% for the RGB, grey-scaled, and histogram equalised approaches. The method’s accuracy on the synthetic audio file of 51 minutes was 92.15%. This accuracy level reveals the potential of CNN architectures in automating species detection and identification by sound using passive monitoring. Our results suggest that using coloured images to represent the spectrogram better generalises the classification than grey-scale and histogram equalised images. This study might develop future avian monitoring programmes based on passive sound recording, which significantly enhances sampling size without increasing cost.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122676264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection and Isolation of 3D Objects in Unstructured Environments 非结构化环境中三维物体的检测与隔离
Pub Date : 2022-08-31 DOI: 10.56541/afmz9460
Dylan Do Couto, J. Butterfield, A. Murphy, K. Rafferty, Joseph Coleman
3D machine vision is a growing trend in the filed of automation for Object Of Interest (OOI) interactions. This is most notable in sectors such as unorganised bin picking for manufacturing and the integration of Autonomous Guided Vehicles (AGVs) in logistics. In the literature, there is a key focus on advancing this area of research through methods of OOI recognition and isolation to simplify more established OOI analysis operations. The main constraint in current OOI isolation methods is the loss of important data and a long process duration which extends the overall run-time of 3D machine vision operations. In this paper we propose a new method of OOI isolation that utilises a combination of classical image processing techniques to reduce OOI data loss and improve run-time efficiency. Results show a high level of data retention with comparable faster run-times to previous research. This paper also hopes to present a series of run-time data points to set a standard for future process run-time comparisons.
三维机器视觉是感兴趣对象(OOI)交互自动化领域的发展趋势。这在制造业的无组织垃圾箱拾取和自动导向车辆(agv)在物流中的集成等领域最为明显。在文献中,重点关注通过OOI识别和分离方法来推进这一领域的研究,以简化更成熟的OOI分析操作。当前OOI隔离方法的主要限制是丢失重要数据和较长的处理时间,这延长了3D机器视觉操作的整体运行时间。在本文中,我们提出了一种新的OOI隔离方法,该方法利用经典图像处理技术的组合来减少OOI数据丢失并提高运行时效率。结果表明,与以前的研究相比,该方法具有较高的数据保留率和可比较的更快的运行时间。本文还希望提供一系列运行时数据点,为将来的进程运行时比较设定一个标准。
{"title":"Detection and Isolation of 3D Objects in Unstructured Environments","authors":"Dylan Do Couto, J. Butterfield, A. Murphy, K. Rafferty, Joseph Coleman","doi":"10.56541/afmz9460","DOIUrl":"https://doi.org/10.56541/afmz9460","url":null,"abstract":"3D machine vision is a growing trend in the filed of automation for Object Of Interest (OOI) interactions. This is most notable in sectors such as unorganised bin picking for manufacturing and the integration of Autonomous Guided Vehicles (AGVs) in logistics. In the literature, there is a key focus on advancing this area of research through methods of OOI recognition and isolation to simplify more established OOI analysis operations. The main constraint in current OOI isolation methods is the loss of important data and a long process duration which extends the overall run-time of 3D machine vision operations. In this paper we propose a new method of OOI isolation that utilises a combination of classical image processing techniques to reduce OOI data loss and improve run-time efficiency. Results show a high level of data retention with comparable faster run-times to previous research. This paper also hopes to present a series of run-time data points to set a standard for future process run-time comparisons.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131809641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Temporal Stability in Automatic Video Colourisation 自动视频着色的时间稳定性
Pub Date : 2022-08-31 DOI: 10.56541/zvhf9195
Rory Ward, J. Breslin
Much research has been carried out into the automatic restoration of archival images. This research ranges from colourisation, to damage restoration, and super-resolution. Conversely, video restoration hasremained largely unexplored. Most efforts to date have involved extending a concept from image restoration to video, in a frame-by-frame manner. These methods result in poor temporal consistency between frames. This manifests itself as temporal instability or flicker. The purpose of this work is to improve upon this limitation. This improvement will be achieved by employing a hybrid approach of deep-learning and exemplar based colourisation. Thus, informing current frame colourisation about its neighbouring frame’s colourisations and therefore alleviating the inter-frame discrepancy issues. This paper has two main contributions. Firstly, a novel end-to-end automatic video colourisation technique with enhanced flicker reduction capabilities is proposed. Secondly, six automatic exemplar acquisition algorithms are compared. The combination of these algorithms and techniques allow for an 8.5% increase in non-referenced image quality over the previous state of the art.
对档案图像的自动恢复进行了大量的研究。这项研究的范围从着色、损伤修复到超分辨率。相反,视频修复在很大程度上仍未被探索。迄今为止,大多数努力都是以逐帧的方式将图像恢复的概念扩展到视频。这些方法导致帧之间的时间一致性差。这表现为时间不稳定或闪烁。这项工作的目的是改进这一限制。这种改进将通过采用深度学习和基于范例的着色的混合方法来实现。因此,通知当前帧的颜色,它的邻近帧的颜色,从而减轻帧间的差异问题。本文有两个主要贡献。首先,提出了一种新的端到端自动视频着色技术,增强了减少闪烁的能力。其次,对6种自动样本采集算法进行了比较。这些算法和技术的结合使得非参考图像质量比以前的技术水平提高了8.5%。
{"title":"Towards Temporal Stability in Automatic Video Colourisation","authors":"Rory Ward, J. Breslin","doi":"10.56541/zvhf9195","DOIUrl":"https://doi.org/10.56541/zvhf9195","url":null,"abstract":"Much research has been carried out into the automatic restoration of archival images. This research ranges from colourisation, to damage restoration, and super-resolution. Conversely, video restoration hasremained largely unexplored. Most efforts to date have involved extending a concept from image restoration to video, in a frame-by-frame manner. These methods result in poor temporal consistency between frames. This manifests itself as temporal instability or flicker. The purpose of this work is to improve upon this limitation. This improvement will be achieved by employing a hybrid approach of deep-learning and exemplar based colourisation. Thus, informing current frame colourisation about its neighbouring frame’s colourisations and therefore alleviating the inter-frame discrepancy issues. This paper has two main contributions. Firstly, a novel end-to-end automatic video colourisation technique with enhanced flicker reduction capabilities is proposed. Secondly, six automatic exemplar acquisition algorithms are compared. The combination of these algorithms and techniques allow for an 8.5% increase in non-referenced image quality over the previous state of the art.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116778621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acoustic Source Localization Using Straight Line Approximations 使用直线近似的声源定位
Pub Date : 2022-08-31 DOI: 10.56541/ljrb7078
Swarnadeep Bagchi, Ruairí de Fréin
The short paper extends an acoustic signal delay estimation method to general anechoic scenario using image processing techniques. The technique proposed in this paper localizes acoustic speech sources by creating a matrix of phase versus frequency histograms, where the same phases are stacked in appropriate bins. With larger delays and multiple sources coexisting in the same matrix, it becomes cluttered with activated bins. This results in high intensity spots on the spectrogram, making source discrimination difficult. In this paper, we have employed morphological filtering, chain-coding and straight line approximations to ignore noise and enhance the target signal features. Lastly, Hough transform is used for the source localization. The resulting estimates are accurate and invariant to the sampling-rate and shall have application in acoustic source separation.
本文利用图像处理技术将声信号延迟估计方法扩展到一般的消声场景。本文提出的技术通过创建相位与频率直方图矩阵来定位声学语音源,其中相同的相位堆叠在适当的箱子中。由于较大的延迟和多个源共存于同一个矩阵中,它变得与激活的箱子混乱。这导致光谱图上的高强度斑点,使源识别困难。在本文中,我们采用形态学滤波、链编码和直线近似来忽略噪声,增强目标信号的特征。最后,利用霍夫变换进行源定位。所得到的估计是准确的,不受采样率的影响,可以应用于声源分离。
{"title":"Acoustic Source Localization Using Straight Line Approximations","authors":"Swarnadeep Bagchi, Ruairí de Fréin","doi":"10.56541/ljrb7078","DOIUrl":"https://doi.org/10.56541/ljrb7078","url":null,"abstract":"The short paper extends an acoustic signal delay estimation method to general anechoic scenario using image processing techniques. The technique proposed in this paper localizes acoustic speech sources by creating a matrix of phase versus frequency histograms, where the same phases are stacked in appropriate bins. With larger delays and multiple sources coexisting in the same matrix, it becomes cluttered with activated bins. This results in high intensity spots on the spectrogram, making source discrimination difficult. In this paper, we have employed morphological filtering, chain-coding and straight line approximations to ignore noise and enhance the target signal features. Lastly, Hough transform is used for the source localization. The resulting estimates are accurate and invariant to the sampling-rate and shall have application in acoustic source separation.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115383043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
24th Irish Machine Vision and Image Processing Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1