Pub Date : 2022-04-19DOI: 10.1109/IPTA54936.2022.9784147
Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção
Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.
{"title":"Learning-based compression of visual objects for smart surveillance","authors":"Ruben Antonio, S. Faria, Luis M. N. Tavora, A. Navarro, P. Assunção","doi":"10.1109/IPTA54936.2022.9784147","DOIUrl":"https://doi.org/10.1109/IPTA54936.2022.9784147","url":null,"abstract":"Advanced video applications in smart environments (e.g., smart cities) bring different challenges associated with increasingly intelligent systems and demanding requirements in emerging fields such as urban surveillance, computer vision in industry, medicine and others. As a consequence, a huge amount of visual data is captured to be analyzed by task-algorithm driven machines. In this context, this paper proposes an efficient learning-based approach to compress relevant visual objects, captured in surveillance contexts and delivered for machine vision processing. An object-based compression scheme is devised, comprising multiple autoencoders, each one optimised to produce an efficient latent representation of a corresponding object class. The performance of the proposed approach is evaluated with two types of visual objects: persons and faces and two task-algorithms: class identification and object recognition, besides traditional image quality metrics like PSNR and VMAF. In comparison with the Versatile Video Coding (VVC) standard, the proposed approach achieves significantly better coding efficiency than the VVC, e.g., up to 46.7% BD-rate reduction. The accuracy of the machine vision tasks is also significantly higher when performed over visual objects compressed with the proposed scheme in comparison with the same tasks performed over the same visual objects compressed with the VVC. These results demonstrate that the learning-based approach proposed in this paper is a more efficient solution for compression of visual objects than standard encoding.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/IPTA54936.2022.9784130
Philipp Gräbel, Julian Thull, M. Crysandt, B. Klinkhammer, P. Boor, T. Brümmendorf, D. Merhof
In contrast to peripheral blood, cells in bone marrow microscopy images are not only characterized by the cell lineage but also a maturity stage within the lineage. As maturation is a continuous process, the differentiation between various stages falls into the category of (ordinal) regression. In this work, we propose Spatial Maturity Regression - a technique that regularizes the learning process to enforce a sensible positioning of maturity stages in the embedding space. To this end, we propose and evaluate several curve models, target definitions and loss function that incorporate this domain knowledge. We show that the classification F-scores improve up to 2.4 percentage points when enforcing regression targets along learnable curves in the embedding space. This technique further allows visualization of individual predictions by providing the projected position along the learnt curve.
{"title":"Spatial Maturity Regression for the Classification of Hematopoietic Cells","authors":"Philipp Gräbel, Julian Thull, M. Crysandt, B. Klinkhammer, P. Boor, T. Brümmendorf, D. Merhof","doi":"10.1109/IPTA54936.2022.9784130","DOIUrl":"https://doi.org/10.1109/IPTA54936.2022.9784130","url":null,"abstract":"In contrast to peripheral blood, cells in bone marrow microscopy images are not only characterized by the cell lineage but also a maturity stage within the lineage. As maturation is a continuous process, the differentiation between various stages falls into the category of (ordinal) regression. In this work, we propose Spatial Maturity Regression - a technique that regularizes the learning process to enforce a sensible positioning of maturity stages in the embedding space. To this end, we propose and evaluate several curve models, target definitions and loss function that incorporate this domain knowledge. We show that the classification F-scores improve up to 2.4 percentage points when enforcing regression targets along learnable curves in the embedding space. This technique further allows visualization of individual predictions by providing the projected position along the learnt curve.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126297587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/IPTA54936.2022.9784153
Atis Elsts, M. Ivanovs, R. Kadikis, O. Sabelnikovs
Good hand hygiene is one of the key factors in preventing infectious diseases, including COVID-19. Advances in machine learning have enabled automated hand hygiene evaluation, with research papers reporting highly accurate hand washing movement classification from video data. However, existing studies typically use datasets collected in lab conditions. In this paper, we apply state-of-the-art techniques such as MobileNetV2 based CNN, including two-stream and recurrent CNN, to three different datasets: a good-quality and uniform lab-based dataset, a more diverse lab-based dataset, and a large-scale real-life dataset collected in a hospital. The results show that while many of the approaches show good accuracy on the first dataset, the accuracy drops significantly o n t he m ore complex datasets. Moreover, all approaches fail to generalize on the third dataset, and only show slightly-better-than random accuracy on videos held out from the training set. This suggests that despite the high accuracy routinely reported in the research literature, the transition to real-world applications for hand washing quality monitoring is not going to be straightforward.
{"title":"CNN for Hand Washing Movement Classification: What Matters More - the Approach or the Dataset?","authors":"Atis Elsts, M. Ivanovs, R. Kadikis, O. Sabelnikovs","doi":"10.1109/IPTA54936.2022.9784153","DOIUrl":"https://doi.org/10.1109/IPTA54936.2022.9784153","url":null,"abstract":"Good hand hygiene is one of the key factors in preventing infectious diseases, including COVID-19. Advances in machine learning have enabled automated hand hygiene evaluation, with research papers reporting highly accurate hand washing movement classification from video data. However, existing studies typically use datasets collected in lab conditions. In this paper, we apply state-of-the-art techniques such as MobileNetV2 based CNN, including two-stream and recurrent CNN, to three different datasets: a good-quality and uniform lab-based dataset, a more diverse lab-based dataset, and a large-scale real-life dataset collected in a hospital. The results show that while many of the approaches show good accuracy on the first dataset, the accuracy drops significantly o n t he m ore complex datasets. Moreover, all approaches fail to generalize on the third dataset, and only show slightly-better-than random accuracy on videos held out from the training set. This suggests that despite the high accuracy routinely reported in the research literature, the transition to real-world applications for hand washing quality monitoring is not going to be straightforward.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132875635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/ipta54936.2022.9784148
François Brémond
Message from Program Chairs.................................................................................................................xii Conference Organization..........................................................................................................................xiv Reviewers...................................................................................................................................................xvi
{"title":"Oral Session 1","authors":"François Brémond","doi":"10.1109/ipta54936.2022.9784148","DOIUrl":"https://doi.org/10.1109/ipta54936.2022.9784148","url":null,"abstract":"Message from Program Chairs.................................................................................................................xii Conference Organization..........................................................................................................................xiv Reviewers...................................................................................................................................................xvi","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132267176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/IPTA54936.2022.9784120
Olga Cherepkova, S. A. Amirshahi, Marius Pedersen
When it comes to evaluating the quality of images, individual observers show different opinions depending on the type of distortion affecting the quality of the image. While in the opinion of one observer a distortion could have a dramatic influence on the quality of the image, another observer could see the same distortion as not having an important effect on the quality of the same image. Using a subjective experiment, we aim to identify the distortions which show the largest variability among observers. For this, 22 observers evaluated the quality of 10 reference images and the 630 test images created from them (21 distortions at three levels). Our results show that the highest variability in subjective scores is linked to distortions like saturation, contrast, sharpness, quantization, some types of added noise, and radial lens distortion.
{"title":"Analyzing the Variability of Subjective Image Quality Ratings for Different Distortions","authors":"Olga Cherepkova, S. A. Amirshahi, Marius Pedersen","doi":"10.1109/IPTA54936.2022.9784120","DOIUrl":"https://doi.org/10.1109/IPTA54936.2022.9784120","url":null,"abstract":"When it comes to evaluating the quality of images, individual observers show different opinions depending on the type of distortion affecting the quality of the image. While in the opinion of one observer a distortion could have a dramatic influence on the quality of the image, another observer could see the same distortion as not having an important effect on the quality of the same image. Using a subjective experiment, we aim to identify the distortions which show the largest variability among observers. For this, 22 observers evaluated the quality of 10 reference images and the 630 test images created from them (21 distortions at three levels). Our results show that the highest variability in subjective scores is linked to distortions like saturation, contrast, sharpness, quantization, some types of added noise, and radial lens distortion.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125845044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/ipta54936.2022.9784137
{"title":"Special Session 2: Explainable AI for Medical Imaging","authors":"","doi":"10.1109/ipta54936.2022.9784137","DOIUrl":"https://doi.org/10.1109/ipta54936.2022.9784137","url":null,"abstract":"","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115480200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/IPTA54936.2022.9784146
D. Beddiar, Mourad Oussalah, T. Seppänen
Medical image captioning is the process of generating clinically significant descriptions to medical images, which has many applications among which medical report generation is the most frequent one. In general, automatic captioning of medical images is of great interest for medical experts since it offers assistance in diagnosis, disease treatment and automating the workflow of the health practitioners. Recently, many efforts have been put forward to obtain accurate descriptions but medical image captioning still provides weak and incorrect descriptions. To alleviate this issue, it is important to explain why the model produced a particular caption based on some specific features. This is performed through Artificial Intelligence Explainability (XAI), which aims to unfold the ‘black-box’ feature of deep-learning based models. We present in this paper an explainable module for medical image captioning that provides a sound interpretation of our attention-based encoder-decoder model by explaining the correspondence between visual features and semantic features. We exploit for that, self-attention to compute word importance of semantic features and visual attention to compute relevant regions of the image that correspond to each generated word of the caption in addition to visualization of visual features extracted at each layer of the Convolutional Neural Network (CNN) encoder. We finally evaluate our model using the ImageCLEF medical captioning dataset.
{"title":"Explainability for Medical Image Captioning","authors":"D. Beddiar, Mourad Oussalah, T. Seppänen","doi":"10.1109/IPTA54936.2022.9784146","DOIUrl":"https://doi.org/10.1109/IPTA54936.2022.9784146","url":null,"abstract":"Medical image captioning is the process of generating clinically significant descriptions to medical images, which has many applications among which medical report generation is the most frequent one. In general, automatic captioning of medical images is of great interest for medical experts since it offers assistance in diagnosis, disease treatment and automating the workflow of the health practitioners. Recently, many efforts have been put forward to obtain accurate descriptions but medical image captioning still provides weak and incorrect descriptions. To alleviate this issue, it is important to explain why the model produced a particular caption based on some specific features. This is performed through Artificial Intelligence Explainability (XAI), which aims to unfold the ‘black-box’ feature of deep-learning based models. We present in this paper an explainable module for medical image captioning that provides a sound interpretation of our attention-based encoder-decoder model by explaining the correspondence between visual features and semantic features. We exploit for that, self-attention to compute word importance of semantic features and visual attention to compute relevant regions of the image that correspond to each generated word of the caption in addition to visualization of visual features extracted at each layer of the Convolutional Neural Network (CNN) encoder. We finally evaluate our model using the ImageCLEF medical captioning dataset.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115825261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.1109/IPTA54936.2022.9784152
Jacob Eek, David Gustafsson, Ludwig Hollmann, M. Nordberg, I. Skog, Magnus Malmström
Raman spectroscopy in conjunction with a Coded Aperture Snapshot Spectral Imaging (CASSI) system allows for detection of small amounts of explosives from stand-off distances. The obtained Compressed Sensing (CS) measurements from CASSI consists of mixed spatial and spectral information, from which a HyperSpectral Image (HSI) can be reconstructed. The HSI contains Raman spectra for all spatial locations in the scene, revealing the existence of substances. In this paper we present the possibility of utilizing a learned prior in the form of a conditional generative model for HSI reconstruction using CS. A Generative Adversarial Network (GAN) is trained using simulated samples of HSI, and conditioning on their respective CASSI measurements to refine the prior. Two different types of simulated HSI were investigated, where spatial overlap of substances was either allowed or disallowed. The results show that the developed method produces precise reconstructions of HSI from their CASSI measurements in a matter of seconds.
{"title":"A Novel and Fast Approach for Reconstructing CASSI-Raman Spectra using Generative Adversarial Networks","authors":"Jacob Eek, David Gustafsson, Ludwig Hollmann, M. Nordberg, I. Skog, Magnus Malmström","doi":"10.1109/IPTA54936.2022.9784152","DOIUrl":"https://doi.org/10.1109/IPTA54936.2022.9784152","url":null,"abstract":"Raman spectroscopy in conjunction with a Coded Aperture Snapshot Spectral Imaging (CASSI) system allows for detection of small amounts of explosives from stand-off distances. The obtained Compressed Sensing (CS) measurements from CASSI consists of mixed spatial and spectral information, from which a HyperSpectral Image (HSI) can be reconstructed. The HSI contains Raman spectra for all spatial locations in the scene, revealing the existence of substances. In this paper we present the possibility of utilizing a learned prior in the form of a conditional generative model for HSI reconstruction using CS. A Generative Adversarial Network (GAN) is trained using simulated samples of HSI, and conditioning on their respective CASSI measurements to refine the prior. Two different types of simulated HSI were investigated, where spatial overlap of substances was either allowed or disallowed. The results show that the developed method produces precise reconstructions of HSI from their CASSI measurements in a matter of seconds.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128979204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-19DOI: 10.48550/arXiv.2204.10942
M. Tschuchnig, Philipp Grubmüller, Lea Maria Stangassinger, Christina Kreutzer, S. Couillard-Després, G. Oostingh, A. Hittmair, M. Gadermayr
Thyroid cancer is currently the fifth most common malignancy diagnosed in women. Since differentiation of cancer sub-types is important for treatment and current, manual methods are time consuming and subjective, automatic computer-aided differentiation of cancer types is crucial. Manual differentiation of thyroid cancer is based on tissue sections, analysed by pathologists using histological features. Due to the enormous size of gigapixel whole slide images, holistic classification u sing deep learning methods is not feasible. Patch based multiple instance learning approaches, combined with aggre-gations such as bag-of-words, is a common approach. This work's contribution is to extend a patch based state-of-the-art method by generating and combining feature vectors of three different patch resolutions and analysing three distinct ways of combining them. The results showed improvements in one of the three multi-scale approaches, while the others led to decreased scores. This provides motivation for analysis and discussion of the individual approaches.
{"title":"Evaluation of Multi-Scale Multiple Instance Learning to Improve Thyroid Cancer Classification","authors":"M. Tschuchnig, Philipp Grubmüller, Lea Maria Stangassinger, Christina Kreutzer, S. Couillard-Després, G. Oostingh, A. Hittmair, M. Gadermayr","doi":"10.48550/arXiv.2204.10942","DOIUrl":"https://doi.org/10.48550/arXiv.2204.10942","url":null,"abstract":"Thyroid cancer is currently the fifth most common malignancy diagnosed in women. Since differentiation of cancer sub-types is important for treatment and current, manual methods are time consuming and subjective, automatic computer-aided differentiation of cancer types is crucial. Manual differentiation of thyroid cancer is based on tissue sections, analysed by pathologists using histological features. Due to the enormous size of gigapixel whole slide images, holistic classification u sing deep learning methods is not feasible. Patch based multiple instance learning approaches, combined with aggre-gations such as bag-of-words, is a common approach. This work's contribution is to extend a patch based state-of-the-art method by generating and combining feature vectors of three different patch resolutions and analysing three distinct ways of combining them. The results showed improvements in one of the three multi-scale approaches, while the others led to decreased scores. This provides motivation for analysis and discussion of the individual approaches.","PeriodicalId":381729,"journal":{"name":"2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132997593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}