首页 > 最新文献

24th Irish Machine Vision and Image Processing Conference最新文献

英文 中文
Beyond Social Distancing: Application of real-world coordinates in a multi-camera system with privacy protection 超越社交距离:在具有隐私保护的多摄像头系统中应用真实世界坐标
Pub Date : 2022-08-31 DOI: 10.56541/rtns4233
Frances Ryan, Feiyan Hu, J. Dietlmeier, N. O’Connor, Kevin McGuinness
In this paper, we develop a privacy-preserving framework to detect and track pedestrians and project to their real-world coordinates facilitating social distancing detection. The transform is calculated using social distancing markers or floor tiles visible in the camera view, without an extensive calibration process. We select a lightweight detection model to process CCTV videos and perform tracking within-camera. The features collected during within-camera tracking are then used to associate passenger trajectories across multiple cameras. We demonstrate and analyze results qualitatively for both social distancing detection and multi-camera tracking on real-world data captured in a busy airport in Dublin, Ireland.
在本文中,我们开发了一个隐私保护框架来检测和跟踪行人,并投影到他们的现实世界坐标,从而促进社会距离检测。变换是使用社交距离标记或在摄像机视图中可见的地砖计算的,没有广泛的校准过程。我们选择了一种轻量级的检测模型来处理CCTV视频并进行摄像机内跟踪。在相机内跟踪期间收集的特征,然后用于关联乘客轨迹跨多个摄像头。我们对在爱尔兰都柏林一个繁忙机场捕获的真实世界数据进行了社交距离检测和多摄像头跟踪的定性演示和分析结果。
{"title":"Beyond Social Distancing: Application of real-world coordinates in a multi-camera system with privacy protection","authors":"Frances Ryan, Feiyan Hu, J. Dietlmeier, N. O’Connor, Kevin McGuinness","doi":"10.56541/rtns4233","DOIUrl":"https://doi.org/10.56541/rtns4233","url":null,"abstract":"In this paper, we develop a privacy-preserving framework to detect and track pedestrians and project to their real-world coordinates facilitating social distancing detection. The transform is calculated using social distancing markers or floor tiles visible in the camera view, without an extensive calibration process. We select a lightweight detection model to process CCTV videos and perform tracking within-camera. The features collected during within-camera tracking are then used to associate passenger trajectories across multiple cameras. We demonstrate and analyze results qualitatively for both social distancing detection and multi-camera tracking on real-world data captured in a busy airport in Dublin, Ireland.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Data Augmentation and Pre-processing Technique for Sign Language Fingerspelling Recognition 一种用于手语拼写识别的数据增强和预处理技术
Pub Date : 2022-08-31 DOI: 10.56541/xbav3102
Frank Fowley, Ellen Rushe, Anthony Ventresque
The reliance of deep learning algorithms on large scale datasets is a significant challenge for sign language recognition (SLR). The shortage of data resources for training SLR models inevitably leads to poor generalisation, especially for low-resource languages. We propose novel data augmentation and preprocessing techniques based on synthetic data generation to overcome these generalisation difficulties. Using these methods, our models achieved a top-1 accuracy of 86.7% and a top-2 accuracy of 95.5% when evaluated against an unseen corpus of Irish Sign Language (ISL) fingerspelling video recordings. We believe that this constitutes a state-of-the-art performance baseline for an Irish Sign Language recognition model when tested on an unseen dataset.
深度学习算法对大规模数据集的依赖是手语识别(SLR)的一个重大挑战。训练单反模型的数据资源不足,不可避免地导致泛化效果差,特别是对于资源匮乏的语言。我们提出了新的基于合成数据生成的数据增强和预处理技术来克服这些泛化困难。使用这些方法,我们的模型在对未见过的爱尔兰手语(ISL)指纹拼写视频记录进行评估时,准确率达到了前1名的86.7%和前2名的95.5%。我们相信,当在一个看不见的数据集上测试时,这构成了爱尔兰手语识别模型的最先进的性能基线。
{"title":"A Data Augmentation and Pre-processing Technique for Sign Language Fingerspelling Recognition","authors":"Frank Fowley, Ellen Rushe, Anthony Ventresque","doi":"10.56541/xbav3102","DOIUrl":"https://doi.org/10.56541/xbav3102","url":null,"abstract":"The reliance of deep learning algorithms on large scale datasets is a significant challenge for sign language recognition (SLR). The shortage of data resources for training SLR models inevitably leads to poor generalisation, especially for low-resource languages. We propose novel data augmentation and preprocessing techniques based on synthetic data generation to overcome these generalisation difficulties. Using these methods, our models achieved a top-1 accuracy of 86.7% and a top-2 accuracy of 95.5% when evaluated against an unseen corpus of Irish Sign Language (ISL) fingerspelling video recordings. We believe that this constitutes a state-of-the-art performance baseline for an Irish Sign Language recognition model when tested on an unseen dataset.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115370482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sign2Speech: A Novel Sign Language to Speech Synthesis Pipeline Sign2Speech:一种新的手语到语音合成管道
Pub Date : 2022-08-31 DOI: 10.56541/ctdh7516
Dan Bigioi, Théo Morales, Ayushi Pandey, Frank Fowley, Peter Corcoran, Julie Carson-Berndsen
The lack of assistive Sign Language technologies for members of the Deaf community has impeded their access to public information, and curtailed their civil rights and social inclusion. In this paper, we introduce a novel proof-of-concept method for end-to-end Sign Language to speech translation without an intermediate text representation.We propose an LSTM-based method to generate speech from hand pose, where the latter can be obtained from applying an off-the-shelf pose predictor to fingerspelling videos. We train our model using a custom dataset of synthetically generated signs annotated with speech labels, and test on a real-world dataset of fingerspelling signs. Our generated output resembles real-world data sufficiently on quantitative measurements. This indicates that our techniques can be used to generate speech from signs, without reliance on text. The use of synthetic datasets further reduces the reliance on real-world, annotated data. However, results can be further improved using hybrid datasets, combining real-world and synthetic data. Our code and datasets are available at https://github.com/DanBigioi/Sign2Speech.
聋人社区成员缺乏辅助手语技术,阻碍了他们获取公共信息,限制了他们的公民权利和社会融入。在本文中,我们介绍了一种新的概念验证方法,用于端到端手语到语音的翻译,而不需要中间文本表示。我们提出了一种基于lstm的方法来从手部姿势生成语音,其中后者可以通过将现成的姿势预测器应用于指纹拼写视频来获得。我们使用一个自定义数据集来训练我们的模型,该数据集由合成的带有语音标签的符号组成,并在一个真实的指纹拼写符号数据集上进行测试。我们生成的输出在定量测量上与真实世界的数据足够相似。这表明我们的技术可以用来从符号中生成语音,而不依赖于文本。合成数据集的使用进一步减少了对真实世界的、带注释的数据的依赖。然而,使用混合数据集,结合真实世界和合成数据,可以进一步改善结果。我们的代码和数据集可在https://github.com/DanBigioi/Sign2Speech上获得。
{"title":"Sign2Speech: A Novel Sign Language to Speech Synthesis Pipeline","authors":"Dan Bigioi, Théo Morales, Ayushi Pandey, Frank Fowley, Peter Corcoran, Julie Carson-Berndsen","doi":"10.56541/ctdh7516","DOIUrl":"https://doi.org/10.56541/ctdh7516","url":null,"abstract":"The lack of assistive Sign Language technologies for members of the Deaf community has impeded their access to public information, and curtailed their civil rights and social inclusion. In this paper, we introduce a novel proof-of-concept method for end-to-end Sign Language to speech translation without an intermediate text representation.We propose an LSTM-based method to generate speech from hand pose, where the latter can be obtained from applying an off-the-shelf pose predictor to fingerspelling videos. We train our model using a custom dataset of synthetically generated signs annotated with speech labels, and test on a real-world dataset of fingerspelling signs. Our generated output resembles real-world data sufficiently on quantitative measurements. This indicates that our techniques can be used to generate speech from signs, without reliance on text. The use of synthetic datasets further reduces the reliance on real-world, annotated data. However, results can be further improved using hybrid datasets, combining real-world and synthetic data. Our code and datasets are available at https://github.com/DanBigioi/Sign2Speech.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130681484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diversity Issues in Skin Lesion Datasets 皮肤病变数据集的多样性问题
Pub Date : 2022-08-31 DOI: 10.56541/kppv3732
N. Alipour, Ted Burke, J. Courtney
Melanoma is one of the most threatening skin cancers in the world, which may spread to other parts of the body if it has not been detected at an early stage. Thus, researchers have put extra efforts into using computer-aided methods to help dermatologists to recognise this kind of cancer. There are many methods for solving this issue, many based on deep learning models. In order to train these models and have high accuracy, datasets which are large enough to cover gender, race, and skin type diversity are required. Although there is a large body of data on melanoma and skin lesions, most do not cover a broad diversity of skin types, which can affect the accuracy of models trained on them. To understand the issue, first the diversity of each database must be assessed and then, based on the existing shortcomings, such as minority skin types, a suitable method must be developed to solve any diversity issues. This article summarizes the problem of the lack of diversity in gender, race and skin type in skin lesion datasets and takes a brief look at potential solutions to this problem, especially the lesser discussed colour-based methods.
黑色素瘤是世界上最具威胁性的皮肤癌之一,如果在早期阶段没有被发现,它可能会扩散到身体的其他部位。因此,研究人员已经投入了额外的努力,使用计算机辅助方法来帮助皮肤科医生识别这类癌症。有很多方法可以解决这个问题,很多都是基于深度学习模型的。为了训练这些模型并使其具有较高的准确性,需要足够大的数据集来覆盖性别、种族和皮肤类型的多样性。尽管有大量关于黑色素瘤和皮肤病变的数据,但大多数数据并没有涵盖广泛的皮肤类型,这可能会影响对它们进行训练的模型的准确性。要了解这个问题,首先必须评估每个数据库的多样性,然后根据现有的缺点,例如少数民族的皮肤类型,必须开发一个合适的方法来解决任何多样性问题。本文总结了皮肤病变数据集中缺乏性别、种族和皮肤类型多样性的问题,并简要介绍了该问题的潜在解决方案,特别是较少讨论的基于颜色的方法。
{"title":"Diversity Issues in Skin Lesion Datasets","authors":"N. Alipour, Ted Burke, J. Courtney","doi":"10.56541/kppv3732","DOIUrl":"https://doi.org/10.56541/kppv3732","url":null,"abstract":"Melanoma is one of the most threatening skin cancers in the world, which may spread to other parts of the body if it has not been detected at an early stage. Thus, researchers have put extra efforts into using computer-aided methods to help dermatologists to recognise this kind of cancer. There are many methods for solving this issue, many based on deep learning models. In order to train these models and have high accuracy, datasets which are large enough to cover gender, race, and skin type diversity are required. Although there is a large body of data on melanoma and skin lesions, most do not cover a broad diversity of skin types, which can affect the accuracy of models trained on them. To understand the issue, first the diversity of each database must be assessed and then, based on the existing shortcomings, such as minority skin types, a suitable method must be developed to solve any diversity issues. This article summarizes the problem of the lack of diversity in gender, race and skin type in skin lesion datasets and takes a brief look at potential solutions to this problem, especially the lesser discussed colour-based methods.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131345678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geometrically reconstructing confocal microscopy images for modelling the retinal microvasculature as a 3D cylindrical network 几何重建共聚焦显微镜图像建模视网膜微血管作为一个三维圆柱形网络
Pub Date : 2022-08-31 DOI: 10.56541/ktxe9847
Evan P. Troendle, P. Barabas, Tim Curtis
Microvascular networks can be modelled as a network of connected cylinders. Presently, however, there are limited approaches with which to recover these networks from biomedical images. We have therefore developed and implemented computer algorithms to geometrically reconstruct three-dimensional (3D) retinal microvascular networks from micrometre-scale imagery, resulting in a concise representation of two endpoints and radius for each cylinder detected within a delimited text file. This format is suitable for a variety of purposes, including efficient simulations of molecular delivery. Here, we detail a semi-automated pipeline consisting of the detection of retinal microvascular volumes within 3D imaging datasets, the enhancement and analysis of these volumes for reconstruction, and the geometric construction algorithm itself, which converts voxel data into representative 3D cylindrical objects.
微血管网络可以被建模为一个连接的圆柱体网络。然而,目前从生物医学图像中恢复这些网络的方法有限。因此,我们开发并实施了计算机算法,从微米尺度图像中几何重建三维(3D)视网膜微血管网络,从而在分隔的文本文件中检测到两个端点和每个圆柱体的半径的简明表示。这种格式适用于各种目的,包括分子传递的有效模拟。在这里,我们详细介绍了一个半自动管道,包括3D成像数据集中视网膜微血管体积的检测,这些体积的增强和分析用于重建,以及几何构造算法本身,它将体素数据转换为具有代表性的3D圆柱形对象。
{"title":"Geometrically reconstructing confocal microscopy images for modelling the retinal microvasculature as a 3D cylindrical network","authors":"Evan P. Troendle, P. Barabas, Tim Curtis","doi":"10.56541/ktxe9847","DOIUrl":"https://doi.org/10.56541/ktxe9847","url":null,"abstract":"Microvascular networks can be modelled as a network of connected cylinders. Presently, however, there are limited approaches with which to recover these networks from biomedical images. We have therefore developed and implemented computer algorithms to geometrically reconstruct three-dimensional (3D) retinal microvascular networks from micrometre-scale imagery, resulting in a concise representation of two endpoints and radius for each cylinder detected within a delimited text file. This format is suitable for a variety of purposes, including efficient simulations of molecular delivery. Here, we detail a semi-automated pipeline consisting of the detection of retinal microvascular volumes within 3D imaging datasets, the enhancement and analysis of these volumes for reconstruction, and the geometric construction algorithm itself, which converts voxel data into representative 3D cylindrical objects.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121950618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pre- and Post-Operative Analysis of Planar Radiographs in Total Hip Replacement 全髋关节置换术前后平面x线片分析
Pub Date : 2022-08-31 DOI: 10.56541/exjl3727
O. Denton, Christopher Madden-McKee, Janet C. Hill, D. Beverland, N. Dunne, A. Lennon
Computed-Tomography scans represent the gold standard for accuracy when preoperatively templating and postoperatively assessing the hip. However, planar radiographs are used as standard, sacrificing accuracy. In this work, a method is proposed to more accurately assess femoral offset and neck-shaft angle from two planar radiographs (frontal and lateral), allowing more reliable templating of a modular stem. A second method is proposed to accurately assess postoperative stem version from planar frontal radiographs.
计算机断层扫描是术前模板和术后评估髋关节准确性的金标准。然而,平面x光片作为标准使用,牺牲了精度。在这项工作中,提出了一种方法,可以更准确地评估股骨偏移和颈轴角度从两个平面x线片(正位和侧位),允许更可靠的模板模块化干。第二种方法是通过平面正面x线片准确评估术后主干版本。
{"title":"Pre- and Post-Operative Analysis of Planar Radiographs in Total Hip Replacement","authors":"O. Denton, Christopher Madden-McKee, Janet C. Hill, D. Beverland, N. Dunne, A. Lennon","doi":"10.56541/exjl3727","DOIUrl":"https://doi.org/10.56541/exjl3727","url":null,"abstract":"Computed-Tomography scans represent the gold standard for accuracy when preoperatively templating and postoperatively assessing the hip. However, planar radiographs are used as standard, sacrificing accuracy. In this work, a method is proposed to more accurately assess femoral offset and neck-shaft angle from two planar radiographs (frontal and lateral), allowing more reliable templating of a modular stem. A second method is proposed to accurately assess postoperative stem version from planar frontal radiographs.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122594088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating feature attribution methods into the loss function of deep learning classifiers 将特征归因方法集成到深度学习分类器的损失函数中
Pub Date : 2022-08-31 DOI: 10.56541/omxa8857
James Callanan, Carles Garcia-Cabrera, Niamh Belton, G. Roshchupkin, Kathleen M. Curran
Feature attribution methods are typically used post-training to judge if a deep learning classifier is using meaningful concepts in an input image when making classifications. In this study, we propose using feature attribution methods to give a classifier automated feedback throughout the training process via a novel loss function. We call such a loss function, a heatmap loss function. Heatmap loss functions enable us to incentivize a model to rely on relevant sections of the input image when making classifications. Two groups of models were trained, one group with a heatmap loss function and the other using categorical cross entropy (CCE). Models trained with the heatmap loss function were capable of achieving equivalent classification accuracies on a test dataset of synthesised cardiac MRI slices. Moreover, HiResCAM heatmaps suggest that these models relied to a greater extent on regions of the MRI slices within the heart. A further experiment demonstrated how heatmap loss functions can be used to prevent deep learning classifiers from using noncausal concepts that disproportionately co-occur with images of a certain class when making classifications. This suggests that heatmap loss functions could be used to prevent models from learning dataset biases by directing where the model should be looking when making classifications.
特征归因方法通常用于训练后判断深度学习分类器在分类时是否使用了输入图像中有意义的概念。在本研究中,我们提出使用特征归因方法,通过一种新的损失函数在整个训练过程中给分类器提供自动反馈。我们称它为损失函数,热图损失函数。热图损失函数使我们能够激励模型在进行分类时依赖输入图像的相关部分。对两组模型进行训练,一组使用热图损失函数,另一组使用分类交叉熵(CCE)。使用热图损失函数训练的模型能够在合成心脏MRI切片的测试数据集上实现等效的分类精度。此外,HiResCAM热图显示,这些模型在很大程度上依赖于心脏内部的MRI切片区域。进一步的实验展示了如何使用热图损失函数来防止深度学习分类器在分类时使用非因果概念,这些概念与特定类别的图像不成比例地共同出现。这表明热图损失函数可以用来防止模型学习数据集偏差,通过指导模型在进行分类时应该寻找的位置。
{"title":"Integrating feature attribution methods into the loss function of deep learning classifiers","authors":"James Callanan, Carles Garcia-Cabrera, Niamh Belton, G. Roshchupkin, Kathleen M. Curran","doi":"10.56541/omxa8857","DOIUrl":"https://doi.org/10.56541/omxa8857","url":null,"abstract":"Feature attribution methods are typically used post-training to judge if a deep learning classifier is using meaningful concepts in an input image when making classifications. In this study, we propose using feature attribution methods to give a classifier automated feedback throughout the training process via a novel loss function. We call such a loss function, a heatmap loss function. Heatmap loss functions enable us to incentivize a model to rely on relevant sections of the input image when making classifications. Two groups of models were trained, one group with a heatmap loss function and the other using categorical cross entropy (CCE). Models trained with the heatmap loss function were capable of achieving equivalent classification accuracies on a test dataset of synthesised cardiac MRI slices. Moreover, HiResCAM heatmaps suggest that these models relied to a greater extent on regions of the MRI slices within the heart. A further experiment demonstrated how heatmap loss functions can be used to prevent deep learning classifiers from using noncausal concepts that disproportionately co-occur with images of a certain class when making classifications. This suggests that heatmap loss functions could be used to prevent models from learning dataset biases by directing where the model should be looking when making classifications.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129362344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triple Loss based Satellite Image Localisation for Aerial Platforms 基于三损失的航空平台卫星图像定位
Pub Date : 2022-08-31 DOI: 10.56541/pjfn5642
Eduardo Andres Avila Herrera, Tim McCarhy, J. McDonald
We present a vision-based technique for aerial platform localisation using satellite imagery. Our approach applies a modified VGG16 network in conjunction with a triplet loss to encode aerial views as discriminative scene embeddings. The platform is localised by comparing the encodding of its current view with a database of pre-encoded embeddings using a cosine similarity metric. Recent image-based localisation research has shown potential for such learned embeddings, however, to ensure reliable matching they require dense sampling of views of the environment, thereby limiting their operational area. In contrast, the combination of our proposed architecture in conjunction with the triplet loss shows robustness over greater spatial shifts, reducing the need for dense sampling. We demonstrate these improvements through comparison with a state-of-the-art approach using simulated ground truth sequences derived from a real-world satellite dataset covering a 1.5km × 1km region in Karslruhe.
我们提出了一种基于视觉的技术,利用卫星图像进行空中平台定位。我们的方法将改进的VGG16网络与三重损失相结合,将鸟瞰图编码为判别场景嵌入。该平台通过使用余弦相似度度量将其当前视图的编码与预编码嵌入的数据库进行比较来定位。最近基于图像的定位研究显示了这种学习嵌入的潜力,然而,为了确保可靠的匹配,它们需要对环境视图进行密集采样,从而限制了它们的操作区域。相比之下,我们提出的结构与三重态损失的结合在更大的空间位移上显示出鲁棒性,减少了对密集采样的需求。通过与一种最先进的方法进行比较,我们展示了这些改进,该方法使用了来自卡尔斯鲁厄1.5公里× 1公里区域的真实世界卫星数据集的模拟地面真值序列。
{"title":"Triple Loss based Satellite Image Localisation for Aerial Platforms","authors":"Eduardo Andres Avila Herrera, Tim McCarhy, J. McDonald","doi":"10.56541/pjfn5642","DOIUrl":"https://doi.org/10.56541/pjfn5642","url":null,"abstract":"We present a vision-based technique for aerial platform localisation using satellite imagery. Our approach applies a modified VGG16 network in conjunction with a triplet loss to encode aerial views as discriminative scene embeddings. The platform is localised by comparing the encodding of its current view with a database of pre-encoded embeddings using a cosine similarity metric. Recent image-based localisation research has shown potential for such learned embeddings, however, to ensure reliable matching they require dense sampling of views of the environment, thereby limiting their operational area. In contrast, the combination of our proposed architecture in conjunction with the triplet loss shows robustness over greater spatial shifts, reducing the need for dense sampling. We demonstrate these improvements through comparison with a state-of-the-art approach using simulated ground truth sequences derived from a real-world satellite dataset covering a 1.5km × 1km region in Karslruhe.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Influence of Magnification in Deep Learning Aided Image Segmentation in Histological Digital Image Analysis 放大倍数对深度学习辅助图像分割在组织学数字图像分析中的影响
Pub Date : 2022-08-31 DOI: 10.56541/rakl2135
Kris McCombe, Stephanie G Craig, Jacqueline James, R. Gault
The use of digital pathology has grown significantly for both healthcare and research purposes in recent years. With this comes opportunity to develop systems supported by computer vision (CV) and artificial intelligence (AI), with the potential to improve patient management and quality of care. The accessibility of CV and AI toolboxes have resulted in the rapid application of image analysis in this domain driven by accuracy related metrics. However, in this short paper we illustrate common pitfalls in the field through a semantic segmentation task, specifically how magnification can influence training data quality and demonstrate how this can ultimately affect model robustness.
近年来,数字病理学在医疗保健和研究方面的应用显著增长。由此带来了开发由计算机视觉(CV)和人工智能(AI)支持的系统的机会,这些系统有可能改善患者管理和护理质量。CV和AI工具箱的可访问性导致了图像分析在该领域的快速应用,这是由准确性相关指标驱动的。然而,在这篇短文中,我们通过语义分割任务说明了该领域的常见陷阱,特别是放大如何影响训练数据质量,并演示了这最终如何影响模型的鲁棒性。
{"title":"Influence of Magnification in Deep Learning Aided Image Segmentation in Histological Digital Image Analysis","authors":"Kris McCombe, Stephanie G Craig, Jacqueline James, R. Gault","doi":"10.56541/rakl2135","DOIUrl":"https://doi.org/10.56541/rakl2135","url":null,"abstract":"The use of digital pathology has grown significantly for both healthcare and research purposes in recent years. With this comes opportunity to develop systems supported by computer vision (CV) and artificial intelligence (AI), with the potential to improve patient management and quality of care. The accessibility of CV and AI toolboxes have resulted in the rapid application of image analysis in this domain driven by accuracy related metrics. However, in this short paper we illustrate common pitfalls in the field through a semantic segmentation task, specifically how magnification can influence training data quality and demonstrate how this can ultimately affect model robustness.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132604710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Texture improvement for human shape estimation from a single image 基于单幅图像的人体形状估计的纹理改进
Pub Date : 2022-08-31 DOI: 10.56541/soww6683
Jorge Gonzalez Escribano, Susana Rauno, A. Swaminathan, David Smyth, A. Smolic
Current human digitization techniques from a single image are showing promising results when it comes to the quality of the estimated geometry, but they often fall short when it comes to the texture of the generated 3D model, especially on the occluded side of the person, while some others do not even output a texture for the model. Our goal in this paper is to improve the predicted texture of these models without requiring any other additional input more than the original image used to generate the 3D model in the first place. For that, we propose a novel way to predict the back view of the person by including semantic and positional information that outperforms the state-of-the-art techniques. Our method is based on a general-purpose image-to-image translation algorithm with conditional adversarial networks adapted to predict the back view of a human. Furthermore, we use the predicted image to improve the texture of the 3D estimated model and we provide a 3D dataset, V-Human, to train our method and also any 3D human shape estimation algorithms which use meshes such as PIFu.
目前的人体数字化技术在单个图像的估计几何质量方面显示出有希望的结果,但是当涉及到生成的3D模型的纹理时,特别是在人被遮挡的一侧,而其他一些甚至不为模型输出纹理时,它们往往会出现不足。我们在本文中的目标是改善这些模型的预测纹理,而不需要任何其他额外的输入,而不是首先用于生成3D模型的原始图像。为此,我们提出了一种新颖的方法,通过包含语义和位置信息来预测人的后视图,这种方法优于最先进的技术。我们的方法基于一种通用的图像到图像翻译算法,该算法具有条件对抗网络,可用于预测人类的后视图。此外,我们使用预测图像来改进3D估计模型的纹理,并提供了一个3D数据集V-Human来训练我们的方法以及任何使用PIFu等网格的3D人体形状估计算法。
{"title":"Texture improvement for human shape estimation from a single image","authors":"Jorge Gonzalez Escribano, Susana Rauno, A. Swaminathan, David Smyth, A. Smolic","doi":"10.56541/soww6683","DOIUrl":"https://doi.org/10.56541/soww6683","url":null,"abstract":"Current human digitization techniques from a single image are showing promising results when it comes to the quality of the estimated geometry, but they often fall short when it comes to the texture of the generated 3D model, especially on the occluded side of the person, while some others do not even output a texture for the model. Our goal in this paper is to improve the predicted texture of these models without requiring any other additional input more than the original image used to generate the 3D model in the first place. For that, we propose a novel way to predict the back view of the person by including semantic and positional information that outperforms the state-of-the-art techniques. Our method is based on a general-purpose image-to-image translation algorithm with conditional adversarial networks adapted to predict the back view of a human. Furthermore, we use the predicted image to improve the texture of the 3D estimated model and we provide a 3D dataset, V-Human, to train our method and also any 3D human shape estimation algorithms which use meshes such as PIFu.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
24th Irish Machine Vision and Image Processing Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1