Frances Ryan, Feiyan Hu, J. Dietlmeier, N. O’Connor, Kevin McGuinness
In this paper, we develop a privacy-preserving framework to detect and track pedestrians and project to their real-world coordinates facilitating social distancing detection. The transform is calculated using social distancing markers or floor tiles visible in the camera view, without an extensive calibration process. We select a lightweight detection model to process CCTV videos and perform tracking within-camera. The features collected during within-camera tracking are then used to associate passenger trajectories across multiple cameras. We demonstrate and analyze results qualitatively for both social distancing detection and multi-camera tracking on real-world data captured in a busy airport in Dublin, Ireland.
{"title":"Beyond Social Distancing: Application of real-world coordinates in a multi-camera system with privacy protection","authors":"Frances Ryan, Feiyan Hu, J. Dietlmeier, N. O’Connor, Kevin McGuinness","doi":"10.56541/rtns4233","DOIUrl":"https://doi.org/10.56541/rtns4233","url":null,"abstract":"In this paper, we develop a privacy-preserving framework to detect and track pedestrians and project to their real-world coordinates facilitating social distancing detection. The transform is calculated using social distancing markers or floor tiles visible in the camera view, without an extensive calibration process. We select a lightweight detection model to process CCTV videos and perform tracking within-camera. The features collected during within-camera tracking are then used to associate passenger trajectories across multiple cameras. We demonstrate and analyze results qualitatively for both social distancing detection and multi-camera tracking on real-world data captured in a busy airport in Dublin, Ireland.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The reliance of deep learning algorithms on large scale datasets is a significant challenge for sign language recognition (SLR). The shortage of data resources for training SLR models inevitably leads to poor generalisation, especially for low-resource languages. We propose novel data augmentation and preprocessing techniques based on synthetic data generation to overcome these generalisation difficulties. Using these methods, our models achieved a top-1 accuracy of 86.7% and a top-2 accuracy of 95.5% when evaluated against an unseen corpus of Irish Sign Language (ISL) fingerspelling video recordings. We believe that this constitutes a state-of-the-art performance baseline for an Irish Sign Language recognition model when tested on an unseen dataset.
{"title":"A Data Augmentation and Pre-processing Technique for Sign Language Fingerspelling Recognition","authors":"Frank Fowley, Ellen Rushe, Anthony Ventresque","doi":"10.56541/xbav3102","DOIUrl":"https://doi.org/10.56541/xbav3102","url":null,"abstract":"The reliance of deep learning algorithms on large scale datasets is a significant challenge for sign language recognition (SLR). The shortage of data resources for training SLR models inevitably leads to poor generalisation, especially for low-resource languages. We propose novel data augmentation and preprocessing techniques based on synthetic data generation to overcome these generalisation difficulties. Using these methods, our models achieved a top-1 accuracy of 86.7% and a top-2 accuracy of 95.5% when evaluated against an unseen corpus of Irish Sign Language (ISL) fingerspelling video recordings. We believe that this constitutes a state-of-the-art performance baseline for an Irish Sign Language recognition model when tested on an unseen dataset.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115370482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dan Bigioi, Théo Morales, Ayushi Pandey, Frank Fowley, Peter Corcoran, Julie Carson-Berndsen
The lack of assistive Sign Language technologies for members of the Deaf community has impeded their access to public information, and curtailed their civil rights and social inclusion. In this paper, we introduce a novel proof-of-concept method for end-to-end Sign Language to speech translation without an intermediate text representation.We propose an LSTM-based method to generate speech from hand pose, where the latter can be obtained from applying an off-the-shelf pose predictor to fingerspelling videos. We train our model using a custom dataset of synthetically generated signs annotated with speech labels, and test on a real-world dataset of fingerspelling signs. Our generated output resembles real-world data sufficiently on quantitative measurements. This indicates that our techniques can be used to generate speech from signs, without reliance on text. The use of synthetic datasets further reduces the reliance on real-world, annotated data. However, results can be further improved using hybrid datasets, combining real-world and synthetic data. Our code and datasets are available at https://github.com/DanBigioi/Sign2Speech.
{"title":"Sign2Speech: A Novel Sign Language to Speech Synthesis Pipeline","authors":"Dan Bigioi, Théo Morales, Ayushi Pandey, Frank Fowley, Peter Corcoran, Julie Carson-Berndsen","doi":"10.56541/ctdh7516","DOIUrl":"https://doi.org/10.56541/ctdh7516","url":null,"abstract":"The lack of assistive Sign Language technologies for members of the Deaf community has impeded their access to public information, and curtailed their civil rights and social inclusion. In this paper, we introduce a novel proof-of-concept method for end-to-end Sign Language to speech translation without an intermediate text representation.We propose an LSTM-based method to generate speech from hand pose, where the latter can be obtained from applying an off-the-shelf pose predictor to fingerspelling videos. We train our model using a custom dataset of synthetically generated signs annotated with speech labels, and test on a real-world dataset of fingerspelling signs. Our generated output resembles real-world data sufficiently on quantitative measurements. This indicates that our techniques can be used to generate speech from signs, without reliance on text. The use of synthetic datasets further reduces the reliance on real-world, annotated data. However, results can be further improved using hybrid datasets, combining real-world and synthetic data. Our code and datasets are available at https://github.com/DanBigioi/Sign2Speech.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130681484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Melanoma is one of the most threatening skin cancers in the world, which may spread to other parts of the body if it has not been detected at an early stage. Thus, researchers have put extra efforts into using computer-aided methods to help dermatologists to recognise this kind of cancer. There are many methods for solving this issue, many based on deep learning models. In order to train these models and have high accuracy, datasets which are large enough to cover gender, race, and skin type diversity are required. Although there is a large body of data on melanoma and skin lesions, most do not cover a broad diversity of skin types, which can affect the accuracy of models trained on them. To understand the issue, first the diversity of each database must be assessed and then, based on the existing shortcomings, such as minority skin types, a suitable method must be developed to solve any diversity issues. This article summarizes the problem of the lack of diversity in gender, race and skin type in skin lesion datasets and takes a brief look at potential solutions to this problem, especially the lesser discussed colour-based methods.
{"title":"Diversity Issues in Skin Lesion Datasets","authors":"N. Alipour, Ted Burke, J. Courtney","doi":"10.56541/kppv3732","DOIUrl":"https://doi.org/10.56541/kppv3732","url":null,"abstract":"Melanoma is one of the most threatening skin cancers in the world, which may spread to other parts of the body if it has not been detected at an early stage. Thus, researchers have put extra efforts into using computer-aided methods to help dermatologists to recognise this kind of cancer. There are many methods for solving this issue, many based on deep learning models. In order to train these models and have high accuracy, datasets which are large enough to cover gender, race, and skin type diversity are required. Although there is a large body of data on melanoma and skin lesions, most do not cover a broad diversity of skin types, which can affect the accuracy of models trained on them. To understand the issue, first the diversity of each database must be assessed and then, based on the existing shortcomings, such as minority skin types, a suitable method must be developed to solve any diversity issues. This article summarizes the problem of the lack of diversity in gender, race and skin type in skin lesion datasets and takes a brief look at potential solutions to this problem, especially the lesser discussed colour-based methods.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131345678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Microvascular networks can be modelled as a network of connected cylinders. Presently, however, there are limited approaches with which to recover these networks from biomedical images. We have therefore developed and implemented computer algorithms to geometrically reconstruct three-dimensional (3D) retinal microvascular networks from micrometre-scale imagery, resulting in a concise representation of two endpoints and radius for each cylinder detected within a delimited text file. This format is suitable for a variety of purposes, including efficient simulations of molecular delivery. Here, we detail a semi-automated pipeline consisting of the detection of retinal microvascular volumes within 3D imaging datasets, the enhancement and analysis of these volumes for reconstruction, and the geometric construction algorithm itself, which converts voxel data into representative 3D cylindrical objects.
{"title":"Geometrically reconstructing confocal microscopy images for modelling the retinal microvasculature as a 3D cylindrical network","authors":"Evan P. Troendle, P. Barabas, Tim Curtis","doi":"10.56541/ktxe9847","DOIUrl":"https://doi.org/10.56541/ktxe9847","url":null,"abstract":"Microvascular networks can be modelled as a network of connected cylinders. Presently, however, there are limited approaches with which to recover these networks from biomedical images. We have therefore developed and implemented computer algorithms to geometrically reconstruct three-dimensional (3D) retinal microvascular networks from micrometre-scale imagery, resulting in a concise representation of two endpoints and radius for each cylinder detected within a delimited text file. This format is suitable for a variety of purposes, including efficient simulations of molecular delivery. Here, we detail a semi-automated pipeline consisting of the detection of retinal microvascular volumes within 3D imaging datasets, the enhancement and analysis of these volumes for reconstruction, and the geometric construction algorithm itself, which converts voxel data into representative 3D cylindrical objects.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121950618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Denton, Christopher Madden-McKee, Janet C. Hill, D. Beverland, N. Dunne, A. Lennon
Computed-Tomography scans represent the gold standard for accuracy when preoperatively templating and postoperatively assessing the hip. However, planar radiographs are used as standard, sacrificing accuracy. In this work, a method is proposed to more accurately assess femoral offset and neck-shaft angle from two planar radiographs (frontal and lateral), allowing more reliable templating of a modular stem. A second method is proposed to accurately assess postoperative stem version from planar frontal radiographs.
{"title":"Pre- and Post-Operative Analysis of Planar Radiographs in Total Hip Replacement","authors":"O. Denton, Christopher Madden-McKee, Janet C. Hill, D. Beverland, N. Dunne, A. Lennon","doi":"10.56541/exjl3727","DOIUrl":"https://doi.org/10.56541/exjl3727","url":null,"abstract":"Computed-Tomography scans represent the gold standard for accuracy when preoperatively templating and postoperatively assessing the hip. However, planar radiographs are used as standard, sacrificing accuracy. In this work, a method is proposed to more accurately assess femoral offset and neck-shaft angle from two planar radiographs (frontal and lateral), allowing more reliable templating of a modular stem. A second method is proposed to accurately assess postoperative stem version from planar frontal radiographs.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122594088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Callanan, Carles Garcia-Cabrera, Niamh Belton, G. Roshchupkin, Kathleen M. Curran
Feature attribution methods are typically used post-training to judge if a deep learning classifier is using meaningful concepts in an input image when making classifications. In this study, we propose using feature attribution methods to give a classifier automated feedback throughout the training process via a novel loss function. We call such a loss function, a heatmap loss function. Heatmap loss functions enable us to incentivize a model to rely on relevant sections of the input image when making classifications. Two groups of models were trained, one group with a heatmap loss function and the other using categorical cross entropy (CCE). Models trained with the heatmap loss function were capable of achieving equivalent classification accuracies on a test dataset of synthesised cardiac MRI slices. Moreover, HiResCAM heatmaps suggest that these models relied to a greater extent on regions of the MRI slices within the heart. A further experiment demonstrated how heatmap loss functions can be used to prevent deep learning classifiers from using noncausal concepts that disproportionately co-occur with images of a certain class when making classifications. This suggests that heatmap loss functions could be used to prevent models from learning dataset biases by directing where the model should be looking when making classifications.
{"title":"Integrating feature attribution methods into the loss function of deep learning classifiers","authors":"James Callanan, Carles Garcia-Cabrera, Niamh Belton, G. Roshchupkin, Kathleen M. Curran","doi":"10.56541/omxa8857","DOIUrl":"https://doi.org/10.56541/omxa8857","url":null,"abstract":"Feature attribution methods are typically used post-training to judge if a deep learning classifier is using meaningful concepts in an input image when making classifications. In this study, we propose using feature attribution methods to give a classifier automated feedback throughout the training process via a novel loss function. We call such a loss function, a heatmap loss function. Heatmap loss functions enable us to incentivize a model to rely on relevant sections of the input image when making classifications. Two groups of models were trained, one group with a heatmap loss function and the other using categorical cross entropy (CCE). Models trained with the heatmap loss function were capable of achieving equivalent classification accuracies on a test dataset of synthesised cardiac MRI slices. Moreover, HiResCAM heatmaps suggest that these models relied to a greater extent on regions of the MRI slices within the heart. A further experiment demonstrated how heatmap loss functions can be used to prevent deep learning classifiers from using noncausal concepts that disproportionately co-occur with images of a certain class when making classifications. This suggests that heatmap loss functions could be used to prevent models from learning dataset biases by directing where the model should be looking when making classifications.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129362344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eduardo Andres Avila Herrera, Tim McCarhy, J. McDonald
We present a vision-based technique for aerial platform localisation using satellite imagery. Our approach applies a modified VGG16 network in conjunction with a triplet loss to encode aerial views as discriminative scene embeddings. The platform is localised by comparing the encodding of its current view with a database of pre-encoded embeddings using a cosine similarity metric. Recent image-based localisation research has shown potential for such learned embeddings, however, to ensure reliable matching they require dense sampling of views of the environment, thereby limiting their operational area. In contrast, the combination of our proposed architecture in conjunction with the triplet loss shows robustness over greater spatial shifts, reducing the need for dense sampling. We demonstrate these improvements through comparison with a state-of-the-art approach using simulated ground truth sequences derived from a real-world satellite dataset covering a 1.5km × 1km region in Karslruhe.
{"title":"Triple Loss based Satellite Image Localisation for Aerial Platforms","authors":"Eduardo Andres Avila Herrera, Tim McCarhy, J. McDonald","doi":"10.56541/pjfn5642","DOIUrl":"https://doi.org/10.56541/pjfn5642","url":null,"abstract":"We present a vision-based technique for aerial platform localisation using satellite imagery. Our approach applies a modified VGG16 network in conjunction with a triplet loss to encode aerial views as discriminative scene embeddings. The platform is localised by comparing the encodding of its current view with a database of pre-encoded embeddings using a cosine similarity metric. Recent image-based localisation research has shown potential for such learned embeddings, however, to ensure reliable matching they require dense sampling of views of the environment, thereby limiting their operational area. In contrast, the combination of our proposed architecture in conjunction with the triplet loss shows robustness over greater spatial shifts, reducing the need for dense sampling. We demonstrate these improvements through comparison with a state-of-the-art approach using simulated ground truth sequences derived from a real-world satellite dataset covering a 1.5km × 1km region in Karslruhe.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126046533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kris McCombe, Stephanie G Craig, Jacqueline James, R. Gault
The use of digital pathology has grown significantly for both healthcare and research purposes in recent years. With this comes opportunity to develop systems supported by computer vision (CV) and artificial intelligence (AI), with the potential to improve patient management and quality of care. The accessibility of CV and AI toolboxes have resulted in the rapid application of image analysis in this domain driven by accuracy related metrics. However, in this short paper we illustrate common pitfalls in the field through a semantic segmentation task, specifically how magnification can influence training data quality and demonstrate how this can ultimately affect model robustness.
{"title":"Influence of Magnification in Deep Learning Aided Image Segmentation in Histological Digital Image Analysis","authors":"Kris McCombe, Stephanie G Craig, Jacqueline James, R. Gault","doi":"10.56541/rakl2135","DOIUrl":"https://doi.org/10.56541/rakl2135","url":null,"abstract":"The use of digital pathology has grown significantly for both healthcare and research purposes in recent years. With this comes opportunity to develop systems supported by computer vision (CV) and artificial intelligence (AI), with the potential to improve patient management and quality of care. The accessibility of CV and AI toolboxes have resulted in the rapid application of image analysis in this domain driven by accuracy related metrics. However, in this short paper we illustrate common pitfalls in the field through a semantic segmentation task, specifically how magnification can influence training data quality and demonstrate how this can ultimately affect model robustness.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132604710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorge Gonzalez Escribano, Susana Rauno, A. Swaminathan, David Smyth, A. Smolic
Current human digitization techniques from a single image are showing promising results when it comes to the quality of the estimated geometry, but they often fall short when it comes to the texture of the generated 3D model, especially on the occluded side of the person, while some others do not even output a texture for the model. Our goal in this paper is to improve the predicted texture of these models without requiring any other additional input more than the original image used to generate the 3D model in the first place. For that, we propose a novel way to predict the back view of the person by including semantic and positional information that outperforms the state-of-the-art techniques. Our method is based on a general-purpose image-to-image translation algorithm with conditional adversarial networks adapted to predict the back view of a human. Furthermore, we use the predicted image to improve the texture of the 3D estimated model and we provide a 3D dataset, V-Human, to train our method and also any 3D human shape estimation algorithms which use meshes such as PIFu.
{"title":"Texture improvement for human shape estimation from a single image","authors":"Jorge Gonzalez Escribano, Susana Rauno, A. Swaminathan, David Smyth, A. Smolic","doi":"10.56541/soww6683","DOIUrl":"https://doi.org/10.56541/soww6683","url":null,"abstract":"Current human digitization techniques from a single image are showing promising results when it comes to the quality of the estimated geometry, but they often fall short when it comes to the texture of the generated 3D model, especially on the occluded side of the person, while some others do not even output a texture for the model. Our goal in this paper is to improve the predicted texture of these models without requiring any other additional input more than the original image used to generate the 3D model in the first place. For that, we propose a novel way to predict the back view of the person by including semantic and positional information that outperforms the state-of-the-art techniques. Our method is based on a general-purpose image-to-image translation algorithm with conditional adversarial networks adapted to predict the back view of a human. Furthermore, we use the predicted image to improve the texture of the 3D estimated model and we provide a 3D dataset, V-Human, to train our method and also any 3D human shape estimation algorithms which use meshes such as PIFu.","PeriodicalId":180076,"journal":{"name":"24th Irish Machine Vision and Image Processing Conference","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123372221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}