Pub Date : 2023-04-12DOI: 10.48550/arXiv.2304.05741
Beatriz Paula, Plinio Moreno
The human visual system processes images with varied degrees of resolution, with the fovea, a small portion of the retina, capturing the highest acuity region, which gradually declines toward the field of view's periphery. However, the majority of existing object localization methods rely on images acquired by image sensors with space-invariant resolution, ignoring biological attention mechanisms. As a region of interest pooling, this study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image. The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene. Throughout this two-stage pipeline method, we investigate the varying results obtained by utilizing high-level or panoptic features and provide a ground-truth label function for fixation sequences that is smoother, considering in a better way the spatial structure of the problem. Finally, we present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks. We conclude that, due to the complementary nature of both tasks, the training process benefited from the sharing of knowledge, resulting in an improvement in performance when compared to the previous approach's baseline scores.
{"title":"Learning to search for and detect objects in foveal images using deep learning","authors":"Beatriz Paula, Plinio Moreno","doi":"10.48550/arXiv.2304.05741","DOIUrl":"https://doi.org/10.48550/arXiv.2304.05741","url":null,"abstract":"The human visual system processes images with varied degrees of resolution, with the fovea, a small portion of the retina, capturing the highest acuity region, which gradually declines toward the field of view's periphery. However, the majority of existing object localization methods rely on images acquired by image sensors with space-invariant resolution, ignoring biological attention mechanisms. As a region of interest pooling, this study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image. The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene. Throughout this two-stage pipeline method, we investigate the varying results obtained by utilizing high-level or panoptic features and provide a ground-truth label function for fixation sequences that is smoother, considering in a better way the spatial structure of the problem. Finally, we present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks. We conclude that, due to the complementary nature of both tasks, the training process benefited from the sharing of knowledge, resulting in an improvement in performance when compared to the previous approach's baseline scores.","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"7 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116803412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-21DOI: 10.48550/arXiv.2303.11560
Harry Dobbs, O. Batchelor, Richard D. Green, J. Atlas
This paper introduces Smart-Tree, a supervised method for approximating the medial axes of branch skeletons from a tree point cloud. Smart-Tree uses a sparse voxel convolutional neural network to extract the radius and direction towards the medial axis of each input point. A greedy algorithm performs robust skeletonization using the estimated medial axis. Our proposed method provides robustness to complex tree structures and improves fidelity when dealing with self-occlusions, complex geometry, touching branches, and varying point densities. We evaluate Smart-Tree using a multi-species synthetic tree dataset and perform qualitative analysis on a real-world tree point cloud. Our experimentation with synthetic and real-world datasets demonstrates the robustness of our approach over the current state-of-the-art method. The dataset and source code are publicly available.
{"title":"Smart-Tree: Neural Medial Axis Approximation of Point Clouds for 3D Tree Skeletonization","authors":"Harry Dobbs, O. Batchelor, Richard D. Green, J. Atlas","doi":"10.48550/arXiv.2303.11560","DOIUrl":"https://doi.org/10.48550/arXiv.2303.11560","url":null,"abstract":"This paper introduces Smart-Tree, a supervised method for approximating the medial axes of branch skeletons from a tree point cloud. Smart-Tree uses a sparse voxel convolutional neural network to extract the radius and direction towards the medial axis of each input point. A greedy algorithm performs robust skeletonization using the estimated medial axis. Our proposed method provides robustness to complex tree structures and improves fidelity when dealing with self-occlusions, complex geometry, touching branches, and varying point densities. We evaluate Smart-Tree using a multi-species synthetic tree dataset and perform qualitative analysis on a real-world tree point cloud. Our experimentation with synthetic and real-world datasets demonstrates the robustness of our approach over the current state-of-the-art method. The dataset and source code are publicly available.","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115820300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-05DOI: 10.48550/arXiv.2303.02761
R. Heil, Eva Breznik
One of the factors limiting the performance of handwritten text recognition (HTR) for stenography is the small amount of annotated training data. To alleviate the problem of data scarcity, modern HTR methods often employ data augmentation. However, due to specifics of the stenographic script, such settings may not be directly applicable for stenography recognition. In this work, we study 22 classical augmentation techniques, most of which are commonly used for HTR of other scripts, such as Latin handwriting. Through extensive experiments, we identify a group of augmentations, including for example contained ranges of random rotation, shifts and scaling, that are beneficial to the use case of stenography recognition. Furthermore, a number of augmentation approaches, leading to a decrease in recognition performance, are identified. Our results are supported by statistical hypothesis testing. Links to the publicly available dataset and codebase are provided.
{"title":"A Study of Augmentation Methods for Handwritten Stenography Recognition","authors":"R. Heil, Eva Breznik","doi":"10.48550/arXiv.2303.02761","DOIUrl":"https://doi.org/10.48550/arXiv.2303.02761","url":null,"abstract":"One of the factors limiting the performance of handwritten text recognition (HTR) for stenography is the small amount of annotated training data. To alleviate the problem of data scarcity, modern HTR methods often employ data augmentation. However, due to specifics of the stenographic script, such settings may not be directly applicable for stenography recognition. In this work, we study 22 classical augmentation techniques, most of which are commonly used for HTR of other scripts, such as Latin handwriting. Through extensive experiments, we identify a group of augmentations, including for example contained ranges of random rotation, shifts and scaling, that are beneficial to the use case of stenography recognition. Furthermore, a number of augmentation approaches, leading to a decrease in recognition performance, are identified. Our results are supported by statistical hypothesis testing. Links to the publicly available dataset and codebase are provided.","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124802770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.48550/arXiv.2303.00403
Elisabeth Wetzer, Joakim Lindblad, Natavsa Sladoje
Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.
{"title":"Can representation learning for multimodal image registration be improved by supervision of intermediate layers?","authors":"Elisabeth Wetzer, Joakim Lindblad, Natavsa Sladoje","doi":"10.48550/arXiv.2303.00403","DOIUrl":"https://doi.org/10.48550/arXiv.2303.00403","url":null,"abstract":"Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123242432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-05DOI: 10.48550/arXiv.2203.02740
C. F. G. Santos, Mateus Roder, L. A. Passos, J. P. Papa
In the last decade, exponential data growth supplied the machine learning-based algorithms' capacity and enabled their usage in daily life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process since training complex models denotes an expensive task and results are prone to overfit the training data. A supervised regularization technique called MaxDropout was recently proposed to tackle the latter, providing several improvements concerning traditional regularization approaches. In this paper, we present its improved version called MaxDropoutV2. Results considering two public datasets show that the model performs faster than the standard version and, in most cases, provides more accurate results.
{"title":"MaxDropoutV2: An Improved Method to Drop out Neurons in Convolutional Neural Networks","authors":"C. F. G. Santos, Mateus Roder, L. A. Passos, J. P. Papa","doi":"10.48550/arXiv.2203.02740","DOIUrl":"https://doi.org/10.48550/arXiv.2203.02740","url":null,"abstract":"In the last decade, exponential data growth supplied the machine learning-based algorithms' capacity and enabled their usage in daily life activities. Additionally, such an improvement is partially explained due to the advent of deep learning techniques, i.e., stacks of simple architectures that end up in more complex models. Although both factors produce outstanding results, they also pose drawbacks regarding the learning process since training complex models denotes an expensive task and results are prone to overfit the training data. A supervised regularization technique called MaxDropout was recently proposed to tackle the latter, providing several improvements concerning traditional regularization approaches. In this paper, we present its improved version called MaxDropoutV2. Results considering two public datasets show that the model performs faster than the standard version and, in most cases, provides more accurate results.","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133715740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-03-05DOI: 10.48550/arXiv.2203.02728
Thierry Pinheiro Moreira, M. C. S. Santana, L. A. Passos, J. Papa, K. Costa
Seam carving is a computational method capable of resizing images for both reduction and expansion based on its content, instead of the image geometry. Although the technique is mostly employed to deal with redundant information, i.e., regions composed of pixels with similar intensity, it can also be used for tampering images by inserting or removing relevant objects. Therefore, detecting such a process is of extreme importance regarding the image security domain. However, recognizing seam-carved images does not represent a straightforward task even for human eyes, and robust computation tools capable of identifying such alterations are very desirable. In this paper, we propose an end-to-end approach to cope with the problem of automatic seam carving detection that can obtain state-of-the-art results. Experiments conducted over public and private datasets with several tampering configurations evidence the suitability of the proposed model.
{"title":"An End-to-End Approach for Seam Carving Detection using Deep Neural Networks","authors":"Thierry Pinheiro Moreira, M. C. S. Santana, L. A. Passos, J. Papa, K. Costa","doi":"10.48550/arXiv.2203.02728","DOIUrl":"https://doi.org/10.48550/arXiv.2203.02728","url":null,"abstract":"Seam carving is a computational method capable of resizing images for both reduction and expansion based on its content, instead of the image geometry. Although the technique is mostly employed to deal with redundant information, i.e., regions composed of pixels with similar intensity, it can also be used for tampering images by inserting or removing relevant objects. Therefore, detecting such a process is of extreme importance regarding the image security domain. However, recognizing seam-carved images does not represent a straightforward task even for human eyes, and robust computation tools capable of identifying such alterations are very desirable. In this paper, we propose an end-to-end approach to cope with the problem of automatic seam carving detection that can obtain state-of-the-art results. Experiments conducted over public and private datasets with several tampering configurations evidence the suitability of the proposed model.","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124301209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-21DOI: 10.1007/978-3-031-04881-4_46
Shafkat Farabi, H. Himel, Fakhruddin Gazzali, Md. Bakhtiar Hasan, M. H. Kabir, M. Farazi
{"title":"Improving Action Quality Assessment Using Weighted Aggregation","authors":"Shafkat Farabi, H. Himel, Fakhruddin Gazzali, Md. Bakhtiar Hasan, M. H. Kabir, M. Farazi","doi":"10.1007/978-3-031-04881-4_46","DOIUrl":"https://doi.org/10.1007/978-3-031-04881-4_46","url":null,"abstract":"","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117312574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1007/978-3-030-31321-0_42
S. Lafuente-Arroyo, S. Maldonado-Bascón, H. Gómez-Moreno, C. Alén-Cordero
{"title":"Segmentation in Corridor Environments: Combining Floor and Ceiling Detection","authors":"S. Lafuente-Arroyo, S. Maldonado-Bascón, H. Gómez-Moreno, C. Alén-Cordero","doi":"10.1007/978-3-030-31321-0_42","DOIUrl":"https://doi.org/10.1007/978-3-030-31321-0_42","url":null,"abstract":"","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130596539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-01DOI: 10.1007/978-3-030-31332-6_32
A. Ruiz, J. S. Mejía, J. M. López, B. Giraldo
{"title":"Characterization of Cardiac and Respiratory System of Healthy Subjects in Supine and Sitting Position","authors":"A. Ruiz, J. S. Mejía, J. M. López, B. Giraldo","doi":"10.1007/978-3-030-31332-6_32","DOIUrl":"https://doi.org/10.1007/978-3-030-31332-6_32","url":null,"abstract":"","PeriodicalId":319553,"journal":{"name":"Iberian Conference on Pattern Recognition and Image Analysis","volume":"19 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114119454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}