Urban simulations that involve disaster prevention, urban design, and assisted navigation heavily rely on urban geometric models. While large urban areas need a lot of time to be acquired terrestrially, government organizations have already conducted massive aerial LiDAR surveys, some even at the national level. This work aims to provide a pipeline for extracting multi-scale point clouds from 2D building footprints and airborne LiDAR data, which depends on whether the points represent buildings, vegetation, or ground. We denoise the roof slopes, match the vegetation, and roughly recreate the building façades frequently hidden to aerial acquisition using a parametric representation of geometric primitives. We then carry out multiple-scale samplings of the urban geometry until a 3D urban representation can be achieved because we annotate the new version of the original point cloud with the parametric equations representing each part. We mainly tested our methodology in a real-world setting – the city of Genoa – which includes historical buildings and is heavily characterized by irregular ground slopes. Moreover, we present the results of urban reconstruction on part of two other cities, Matera, which has a complex morphology like Genoa, and Rotterdam.
{"title":"From aerial LiDAR point clouds to multiscale urban representation levels by a parametric resampling","authors":"Chiara Romanengo, Bianca Falcidieno, Silvia Biasotti","doi":"10.1016/j.cag.2024.104022","DOIUrl":"10.1016/j.cag.2024.104022","url":null,"abstract":"<div><p>Urban simulations that involve disaster prevention, urban design, and assisted navigation heavily rely on urban geometric models. While large urban areas need a lot of time to be acquired terrestrially, government organizations have already conducted massive aerial LiDAR surveys, some even at the national level. This work aims to provide a pipeline for extracting multi-scale point clouds from 2D building footprints and airborne LiDAR data, which depends on whether the points represent buildings, vegetation, or ground. We denoise the roof slopes, match the vegetation, and roughly recreate the building façades frequently hidden to aerial acquisition using a parametric representation of geometric primitives. We then carry out multiple-scale samplings of the urban geometry until a 3D urban representation can be achieved because we annotate the new version of the original point cloud with the parametric equations representing each part. We mainly tested our methodology in a real-world setting – the city of Genoa – which includes historical buildings and is heavily characterized by irregular ground slopes. Moreover, we present the results of urban reconstruction on part of two other cities, Matera, which has a complex morphology like Genoa, and Rotterdam.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104022"},"PeriodicalIF":2.5,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001572/pdfft?md5=a617708d0acaf24ecd321d09a5821721&pid=1-s2.0-S0097849324001572-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analysis of 3D textures, also known as relief patterns is a challenging task that requires separating repetitive surface patterns from the underlying global geometry. Existing works classify entire surfaces based on one or a few patterns by extracting ad-hoc statistical properties. Unfortunately, these methods are not suitable for objects with multiple geometric textures and perform poorly on more complex shapes. In this paper, we propose a neural network for binary segmentation to infer per-point labels based on the presence of surface relief patterns. We evaluated the proposed architecture on a high resolution point cloud dataset, surpassing the state-of-the-art, while maintaining memory and computation efficiency.
{"title":"Binary segmentation of relief patterns on point clouds","authors":"Gabriele Paolini , Claudio Tortorici , Stefano Berretti","doi":"10.1016/j.cag.2024.104020","DOIUrl":"10.1016/j.cag.2024.104020","url":null,"abstract":"<div><p>Analysis of 3D textures, also known as relief patterns is a challenging task that requires separating repetitive surface patterns from the underlying global geometry. Existing works classify entire surfaces based on one or a few patterns by extracting ad-hoc statistical properties. Unfortunately, these methods are not suitable for objects with multiple geometric textures and perform poorly on more complex shapes. In this paper, we propose a neural network for binary segmentation to infer per-point labels based on the presence of surface relief patterns. We evaluated the proposed architecture on a high resolution point cloud dataset, surpassing the state-of-the-art, while maintaining memory and computation efficiency.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104020"},"PeriodicalIF":2.5,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001559/pdfft?md5=2a3d2170481b5dae4c7f729baa4b2914&pid=1-s2.0-S0097849324001559-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1016/j.cag.2024.104014
Cristiano N. Rodrigues , Ian M. Nunes , Matheus B. Pereira , Hugo Oliveira , Jefersson A. dos Santos
Image segmentation is one of the most classical computer vision tasks. Segmentation tasks yield a set of classes attributed to individual pixels instead of sparsely predicted images or patches, such as in classification or detection tasks. However, creating annotation sets for pixelwise tasks is a very costly task, often requiring hours for labeling single samples in images with multiple classes of objects. In this context, unsupervised learning can be leveraged either to expedite the annotation procedure and/or to guide the segmentation algorithms altogether without the need for manual annotations. Classical unsupervised segmentation methods leveraged techniques from areas as graph theory, image processing, clustering or supervised classifiers in order to achieve “shallow” pixelwise classification. These techniques usually aim to achieve superpixel over-segmentations by grouping similar pixels that should pertain to the same object. Modern deep unsupervised approaches for image segmentation aimed to group pixels in a data-driven way by using the capabilities of deep architectures to process unstructured data such as images. Later, self-supervised learning bypassed the need for labels via pretext tasks, compelling deep architectures to learn more generic features capable of enhancing downstream tasks, including segmentation. The generalized representations produced by unsupervised models have propelled the recent progress in self-supervised, few- and zero-shot learning and even general-purpose foundational models in computer vision, yielding state-of-the-art results across diverse tasks and datasets. This paper provides an overview of unsupervised and generalizable approaches for image segmentation, introduces key concepts and terminology, and discusses the main aspects of state-of-the-art methods. Additionally, we highlight prominent applications in various domains such as remote sensing, medical imaging, and geology. Finally, we discuss trends and future directions for state-of-the-art unsupervised image segmentation.
{"title":"From superpixels to foundational models: An overview of unsupervised and generalizable image segmentation","authors":"Cristiano N. Rodrigues , Ian M. Nunes , Matheus B. Pereira , Hugo Oliveira , Jefersson A. dos Santos","doi":"10.1016/j.cag.2024.104014","DOIUrl":"10.1016/j.cag.2024.104014","url":null,"abstract":"<div><p>Image segmentation is one of the most classical computer vision tasks. Segmentation tasks yield a set of classes attributed to individual pixels instead of sparsely predicted images or patches, such as in classification or detection tasks. However, creating annotation sets for pixelwise tasks is a very costly task, often requiring hours for labeling single samples in images with multiple classes of objects. In this context, unsupervised learning can be leveraged either to expedite the annotation procedure and/or to guide the segmentation algorithms altogether without the need for manual annotations. Classical unsupervised segmentation methods leveraged techniques from areas as graph theory, image processing, clustering or supervised classifiers in order to achieve “shallow” pixelwise classification. These techniques usually aim to achieve superpixel over-segmentations by grouping similar pixels that should pertain to the same object. Modern deep unsupervised approaches for image segmentation aimed to group pixels in a data-driven way by using the capabilities of deep architectures to process unstructured data such as images. Later, self-supervised learning bypassed the need for labels via pretext tasks, compelling deep architectures to learn more generic features capable of enhancing downstream tasks, including segmentation. The generalized representations produced by unsupervised models have propelled the recent progress in self-supervised, few- and zero-shot learning and even general-purpose foundational models in computer vision, yielding state-of-the-art results across diverse tasks and datasets. This paper provides an overview of unsupervised and generalizable approaches for image segmentation, introduces key concepts and terminology, and discusses the main aspects of state-of-the-art methods. Additionally, we highlight prominent applications in various domains such as remote sensing, medical imaging, and geology. Finally, we discuss trends and future directions for state-of-the-art unsupervised image segmentation.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104014"},"PeriodicalIF":2.5,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141961155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1016/j.cag.2024.104021
Bianca Falcidieno, Brian Wyvill, Ergun Akleman, Jorg Peters
The Shape Modeling International awards (SMI awards) were introduced to commemorate the passing of SMI founder, Professor Kunii. Since 2021, the SMI awards recognize exceptional contributors to Shape Modeling. Currently, there are three awards: the Tosiyasu Kunii Distinguished Researcher, the Young Investigator, and the Alexander Pasko Service Award. The 2024 Distinguished Researcher awardees are Gershon Elber and Stefanie Hahmann. The 2024 Young Investigators are Gianmarco Cherchi and Amal Dev Parakkat. The 2024 Service Awardee is Ergun Akleman. This article provides interviews with the five SMI 2024 award winners.
{"title":"Shape Modeling International (SMI) 2024 awards interviews with SMI’2024 award winners","authors":"Bianca Falcidieno, Brian Wyvill, Ergun Akleman, Jorg Peters","doi":"10.1016/j.cag.2024.104021","DOIUrl":"10.1016/j.cag.2024.104021","url":null,"abstract":"<div><p>The Shape Modeling International awards (SMI awards) were introduced to commemorate the passing of SMI founder, Professor Kunii. Since 2021, the SMI awards recognize exceptional contributors to Shape Modeling. Currently, there are three awards: the Tosiyasu Kunii Distinguished Researcher, the Young Investigator, and the Alexander Pasko Service Award. The 2024 Distinguished Researcher awardees are Gershon Elber and Stefanie Hahmann. The 2024 Young Investigators are Gianmarco Cherchi and Amal Dev Parakkat. The 2024 Service Awardee is Ergun Akleman. This article provides interviews with the five SMI 2024 award winners.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104021"},"PeriodicalIF":2.5,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141952720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.cag.2024.104013
Leonardo Ferreira , Gustavo Moreira , Maryam Hosseini , Marcos Lage , Nivan Ferreira , Fabio Miranda
Over the past decade, there has been a significant increase in the development of visual analytics systems dedicated to addressing urban issues. These systems distill intricate urban analysis workflows into intuitive, interactive visual representations and interfaces, enabling users to explore, understand, and derive insights from large and complex data, including street-level imagery, street networks, and building geometries. Developing urban visual analytics systems, however, is a challenging endeavor that requires considerable programming expertise and interaction between various multidisciplinary stakeholders. This situation often leads to monolithic and isolated prototypes that are hard to reproduce, combine, or extend. Concurrently, there has been an increase in the availability of general and urban-specific toolkits, frameworks, and authoring tools that are open source and abstract away the need to implement low-level visual analytics functionalities. This paper provides a hierarchical taxonomy of urban visual analytics systems to contextualize how they are usually designed, implemented, and evaluated. We develop this taxonomy across three distinct levels (i.e., dimensions, categories, and tags), juxtaposing visualization with analytics, data, and system dimensions. We then assess the extent to which current open-source toolkits, frameworks, and authoring tools can effectively support the development of components tailored to urban visual analytics, identifying their strengths and limitations in addressing the unique challenges posed by urban data. In doing so, we offer a roadmap that can guide the effective employment of existing resources and chart a pathway for developing and refining future systems.
{"title":"Assessing the landscape of toolkits, frameworks, and authoring tools for urban visual analytics systems","authors":"Leonardo Ferreira , Gustavo Moreira , Maryam Hosseini , Marcos Lage , Nivan Ferreira , Fabio Miranda","doi":"10.1016/j.cag.2024.104013","DOIUrl":"10.1016/j.cag.2024.104013","url":null,"abstract":"<div><p>Over the past decade, there has been a significant increase in the development of visual analytics systems dedicated to addressing urban issues. These systems distill intricate urban analysis workflows into intuitive, interactive visual representations and interfaces, enabling users to explore, understand, and derive insights from large and complex data, including street-level imagery, street networks, and building geometries. Developing urban visual analytics systems, however, is a challenging endeavor that requires considerable programming expertise and interaction between various multidisciplinary stakeholders. This situation often leads to monolithic and isolated prototypes that are hard to reproduce, combine, or extend. Concurrently, there has been an increase in the availability of general and urban-specific toolkits, frameworks, and authoring tools that are open source and abstract away the need to implement low-level visual analytics functionalities. This paper provides a hierarchical taxonomy of urban visual analytics systems to contextualize how they are usually designed, implemented, and evaluated. We develop this taxonomy across three distinct levels (<em>i.e.</em>, dimensions, categories, and tags), juxtaposing visualization with analytics, data, and system dimensions. We then assess the extent to which current open-source toolkits, frameworks, and authoring tools can effectively support the development of components tailored to urban visual analytics, identifying their strengths and limitations in addressing the unique challenges posed by urban data. In doing so, we offer a roadmap that can guide the effective employment of existing resources and chart a pathway for developing and refining future systems.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104013"},"PeriodicalIF":2.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001481/pdfft?md5=5e1b2ee787bdf31bc660006341515d9a&pid=1-s2.0-S0097849324001481-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.cag.2024.104018
Jin Xiang , Huihuang Zhao , Pengfei Li , Yue Deng , Weiliang Meng
Recent research in arbitrary style transfer has highlighted challenges in maintaining the balance between content structure and style patterns. Moreover, the improper application of style patterns onto the content image often results in suboptimal quality. In this paper, a novel style transfer network, called MCNet, is proposed. It is based on multi-feature correlations. To better explore the intrinsic relationship between the style image and the content image and to transfer the most suitable style onto the content image, a novel Global Style-Attentional Transfer Module, named GSATM, is introduced in this work. GSATM comprises two parts: Forward Adaptive Style Transformation (FAST) and Delayed Style Transformation (DST). The former analyzes the relationship between style and content features and fine-tunes the style features, whereas the latter transfers the content features based on the fine-tuned style features. Moreover, a new encoding and decoding structure is designed to effectively handle the output of GSATM. Extensive quantitative and qualitative experiments fully demonstrate the superiority of our algorithm. Project page: https://github.com/XiangJinCherry/MCNet.
最近在任意风格转换方面的研究凸显了在内容结构和风格模式之间保持平衡所面临的挑战。此外,将风格模式不恰当地应用到内容图像上往往会导致质量不佳。本文提出了一种名为 MCNet 的新型风格转换网络。它基于多特征相关性。为了更好地探索风格图像和内容图像之间的内在关系,并将最合适的风格转移到内容图像上,本文引入了一个新颖的全局风格-意向转移模块(Global Style-Attentional Transfer Module,简称 GSATM)。GSATM 包括两个部分:前向自适应风格转换(FAST)和延迟风格转换(DST)。前者分析风格特征和内容特征之间的关系并微调风格特征,后者则根据微调后的风格特征传输内容特征。此外,还设计了一种新的编码和解码结构,以有效处理 GSATM 的输出。广泛的定量和定性实验充分证明了我们算法的优越性。项目页面:https://github.com/XiangJinCherry/MCNet。
{"title":"Arbitrary style transfer via multi-feature correlation","authors":"Jin Xiang , Huihuang Zhao , Pengfei Li , Yue Deng , Weiliang Meng","doi":"10.1016/j.cag.2024.104018","DOIUrl":"10.1016/j.cag.2024.104018","url":null,"abstract":"<div><p>Recent research in arbitrary style transfer has highlighted challenges in maintaining the balance between content structure and style patterns. Moreover, the improper application of style patterns onto the content image often results in suboptimal quality. In this paper, a novel style transfer network, called MCNet, is proposed. It is based on multi-feature correlations. To better explore the intrinsic relationship between the style image and the content image and to transfer the most suitable style onto the content image, a novel Global Style-Attentional Transfer Module, named GSATM, is introduced in this work. GSATM comprises two parts: Forward Adaptive Style Transformation (FAST) and Delayed Style Transformation (DST). The former analyzes the relationship between style and content features and fine-tunes the style features, whereas the latter transfers the content features based on the fine-tuned style features. Moreover, a new encoding and decoding structure is designed to effectively handle the output of GSATM. Extensive quantitative and qualitative experiments fully demonstrate the superiority of our algorithm. Project page: <span><span>https://github.com/XiangJinCherry/MCNet</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104018"},"PeriodicalIF":2.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1016/j.cag.2024.104017
Heng Zhang , Yuanyuan Pu , Zhengpeng Zhao , Yupan Li , Xin Li , Rencan Nie
A nice image-to-image translation framework is able to acquire an explicit and credible mapping relationship between the source domain and target domains while satisfying two requirements. One is simplicity, the other is extensibility over multiple translation tasks. To this end, we design a concise but versatile generative model for image-to-image translation. Our method includes three major ingredients. First, inspired by popular unconditional normalization layers, named Spatially Adaptive Normalization(SPADE). We introduce a novel Semantics-Appearance Spatially Adaptive Normalization (SA-SPADE), taking into account both semantic structure and style appearance. This enables semantic composition and style appearance information to be sufficiently captured and integrated by our normalization layers. Thanks to SA-SPADE, our model extends to multiple image-to-image translation tasks in an unsupervised or supervised way. Second, we carefully designed two symmetrical network branches to provide semantic and appearance information for our normalization layer, namely Semantic Branch (SB) and Appearance Branch(AB) respectively. Third, we propose novel Semantic-aware Contrastive Loss (SCL) and Appearance-aware Contrastive Loss (ACL)based on newly un-/self- supervised contrastive learning. That is, SCL guarantees domain-invariant (e.g., pose, structure) representations between the generated image and the input image, while ACL ensures domain-specific representations (e.g., color, texture) between the generated image and the reference image. As a result, we verify the effectiveness of our method by comparing it with various task-dependent image translation models in both qualitative and quantitative evaluations.
{"title":"Towards diverse image-to-image translation via adaptive normalization layer and contrast learning","authors":"Heng Zhang , Yuanyuan Pu , Zhengpeng Zhao , Yupan Li , Xin Li , Rencan Nie","doi":"10.1016/j.cag.2024.104017","DOIUrl":"10.1016/j.cag.2024.104017","url":null,"abstract":"<div><p>A nice image-to-image translation framework is able to acquire an explicit and credible mapping relationship between the source domain and target domains while satisfying two requirements. One is simplicity, the other is extensibility over multiple translation tasks. To this end, we design a concise but versatile generative model for image-to-image translation. Our method includes three major ingredients. First, inspired by popular unconditional normalization layers, named Spatially Adaptive Normalization(SPADE). We introduce a novel Semantics-Appearance Spatially Adaptive Normalization (SA-SPADE), taking into account both semantic structure and style appearance. This enables semantic composition and style appearance information to be sufficiently captured and integrated by our normalization layers. Thanks to SA-SPADE, our model extends to multiple image-to-image translation tasks in an unsupervised or supervised way. Second, we carefully designed two symmetrical network branches to provide semantic and appearance information for our normalization layer, namely Semantic Branch (SB) and Appearance Branch(AB) respectively. Third, we propose novel Semantic-aware Contrastive Loss (SCL) and Appearance-aware Contrastive Loss (ACL)based on newly un-/self- supervised contrastive learning. That is, SCL guarantees domain-invariant (e.g., pose, structure) representations between the generated image and the input image, while ACL ensures domain-specific representations (e.g., color, texture) between the generated image and the reference image. As a result, we verify the effectiveness of our method by comparing it with various task-dependent image translation models in both qualitative and quantitative evaluations.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104017"},"PeriodicalIF":2.5,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141852026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-19DOI: 10.1016/j.cag.2024.104015
Alexandre Lopes , Fernando Pereira dos Santos , Diulhio de Oliveira , Mauricio Schiezaro , Helio Pedrini
Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (CNNs), the total number of parameters of leading backbone architectures increased from 62M parameters in 2012 with AlexNet to 7B parameters in 2024 with AIM-7B. Consequently, deploying such deep architectures faces challenges in environments with processing and runtime constraints, particularly in embedded systems. This paper covers the main model compression techniques applied for computer vision tasks, enabling modern models to be used in embedded systems. We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique and expected variations when analyzing it on various embedded devices. We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges for each subarea and present trends for Model Compression.
{"title":"Computer Vision Model Compression Techniques for Embedded Systems:A Survey","authors":"Alexandre Lopes , Fernando Pereira dos Santos , Diulhio de Oliveira , Mauricio Schiezaro , Helio Pedrini","doi":"10.1016/j.cag.2024.104015","DOIUrl":"10.1016/j.cag.2024.104015","url":null,"abstract":"<div><p>Deep neural networks have consistently represented the state of the art in most computer vision problems. In these scenarios, larger and more complex models have demonstrated superior performance to smaller architectures, especially when trained with plenty of representative data. With the recent adoption of Vision Transformer (ViT) based architectures and advanced Convolutional Neural Networks (CNNs), the total number of parameters of leading backbone architectures increased from 62M parameters in 2012 with AlexNet to 7B parameters in 2024 with AIM-7B. Consequently, deploying such deep architectures faces challenges in environments with processing and runtime constraints, particularly in embedded systems. This paper covers the main model compression techniques applied for computer vision tasks, enabling modern models to be used in embedded systems. We present the characteristics of compression subareas, compare different approaches, and discuss how to choose the best technique and expected variations when analyzing it on various embedded devices. We also share codes to assist researchers and new practitioners in overcoming initial implementation challenges for each subarea and present trends for Model Compression.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104015"},"PeriodicalIF":2.5,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S009784932400150X/pdfft?md5=4a61da15472973e3b8b39fed45db404f&pid=1-s2.0-S009784932400150X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141848296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.cag.2024.104012
Ben Veldhuijzen , Remco C. Veltkamp , Omar Ikne , Benjamin Allaert , Hazem Wannous , Marco Emporio , Andrea Giachetti , Joseph J. LaViola Jr. , Ruiwen He , Halim Benhabiles , Adnane Cabani , Anthony Fleury , Karim Hammoudi , Konstantinos Gavalas , Christoforos Vlachos , Athanasios Papanikolaou , Ioannis Romanelis , Vlassis Fotis , Gerasimos Arvanitis , Konstantinos Moustakas , Christoph von Tycowicz
Gesture recognition is a tool to enable novel interactions with different techniques and applications, like Mixed Reality and Virtual Reality environments. With all the recent advancements in gesture recognition from skeletal data, it is still unclear how well state-of-the-art techniques perform in a scenario using precise motions with two hands. This paper presents the results of the SHREC 2024 contest organized to evaluate methods for their recognition of highly similar hand motions using the skeletal spatial coordinate data of both hands. The task is the recognition of 7 motion classes given their spatial coordinates in a frame-by-frame motion. The skeletal data has been captured using a Vicon system and pre-processed into a coordinate system using Blender and Vicon Shogun Post. We created a small, novel dataset with a high variety of durations in frames. This paper shows the results of the contest, showing the techniques created by the 5 research groups on this challenging task and comparing them to our baseline method.
{"title":"SHREC 2024: Recognition of dynamic hand motions molding clay","authors":"Ben Veldhuijzen , Remco C. Veltkamp , Omar Ikne , Benjamin Allaert , Hazem Wannous , Marco Emporio , Andrea Giachetti , Joseph J. LaViola Jr. , Ruiwen He , Halim Benhabiles , Adnane Cabani , Anthony Fleury , Karim Hammoudi , Konstantinos Gavalas , Christoforos Vlachos , Athanasios Papanikolaou , Ioannis Romanelis , Vlassis Fotis , Gerasimos Arvanitis , Konstantinos Moustakas , Christoph von Tycowicz","doi":"10.1016/j.cag.2024.104012","DOIUrl":"10.1016/j.cag.2024.104012","url":null,"abstract":"<div><p>Gesture recognition is a tool to enable novel interactions with different techniques and applications, like Mixed Reality and Virtual Reality environments. With all the recent advancements in gesture recognition from skeletal data, it is still unclear how well state-of-the-art techniques perform in a scenario using precise motions with two hands. This paper presents the results of the SHREC 2024 contest organized to evaluate methods for their recognition of highly similar hand motions using the skeletal spatial coordinate data of both hands. The task is the recognition of 7 motion classes given their spatial coordinates in a frame-by-frame motion. The skeletal data has been captured using a Vicon system and pre-processed into a coordinate system using Blender and Vicon Shogun Post. We created a small, novel dataset with a high variety of durations in frames. This paper shows the results of the contest, showing the techniques created by the 5 research groups on this challenging task and comparing them to our baseline method.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104012"},"PeriodicalIF":2.5,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S009784932400147X/pdfft?md5=d75274a315e451ba3701d800635d5155&pid=1-s2.0-S009784932400147X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.cag.2024.104011
Anna Shilo, Renata G. Raidou
We propose an interactive game based on visual narratives to edutain, i.e., to educate while entertaining, broad audiences against misleading visualizations in healthcare. Uncertainty at various stages of the visualization pipeline may give rise to misleading visual representations. These comprise misleading elements that may negatively impact the audiences by contributing to misinformed decisions, delayed treatments, and a lack of trust in medical information. We investigate whether visual narratives within the setting of an educational game support recognizing and addressing misleading elements in healthcare-related visualizations. Our methodological approach focuses on three key aspects: (i) identifying uncertainty types in the visualization pipeline which could serve as the origin of misleading elements, (ii) designing fictional visual narratives that comprise several misleading elements linking to these uncertainties, and (iii) proposing an interactive game that aids the communication of these misleading visualization elements to broad audiences. The game features eight fictional visual narratives built around misleading visualizations, each with specific assumptions linked to uncertainties. Players assess the correctness of these assumptions to earn points and rewards. In case of incorrect assessments, interactive explanations are provided to enhance understanding For an initial assessment of our game, we conducted a user study with 21 participants. Our study indicates that when participants incorrectly assess assumptions, they also spend more time elaborating on the reasons for their mistakes, indicating a willingness to learn more. The study also provided positive indications on game aspects such as memorability, reinforcement, and engagement, while it gave us pointers for future improvement.
{"title":"Visual narratives to edutain against misleading visualizations in healthcare","authors":"Anna Shilo, Renata G. Raidou","doi":"10.1016/j.cag.2024.104011","DOIUrl":"10.1016/j.cag.2024.104011","url":null,"abstract":"<div><p>We propose an interactive game based on visual narratives to <em>edutain</em>, i.e., to educate while entertaining, broad audiences against misleading visualizations in healthcare. Uncertainty at various stages of the visualization pipeline may give rise to misleading visual representations. These comprise misleading elements that may negatively impact the audiences by contributing to misinformed decisions, delayed treatments, and a lack of trust in medical information. We investigate whether visual narratives within the setting of an educational game support recognizing and addressing misleading elements in healthcare-related visualizations. Our methodological approach focuses on three key aspects: <em>(i)</em> identifying uncertainty types in the visualization pipeline which could serve as the origin of misleading elements, <em>(ii)</em> designing fictional visual narratives that comprise several misleading elements linking to these uncertainties, and <em>(iii)</em> proposing an interactive game that aids the communication of these misleading visualization elements to broad audiences. The game features eight fictional visual narratives built around misleading visualizations, each with specific assumptions linked to uncertainties. Players assess the correctness of these assumptions to earn points and rewards. In case of incorrect assessments, interactive explanations are provided to enhance understanding For an initial assessment of our game, we conducted a user study with 21 participants. Our study indicates that when participants incorrectly assess assumptions, they also spend more time elaborating on the reasons for their mistakes, indicating a willingness to learn more. The study also provided positive indications on game aspects such as memorability, reinforcement, and engagement, while it gave us pointers for future improvement.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104011"},"PeriodicalIF":2.5,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001468/pdfft?md5=57b27d37f4d2906c08649b9ce6e5e5e3&pid=1-s2.0-S0097849324001468-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141690022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}