首页 > 最新文献

Foundations and Trends in Computer Graphics and Vision最新文献

英文 中文
Semantic Image Segmentation: Two Decades of Research 语义图像分割:二十年的研究
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-02-13 DOI: 10.1561/0600000095
G. Csurka, Riccardo Volpi, Boris Chidlovskii
Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers. We complement the review by discussing particular cases of the weak supervision and side machine learning techniques that can be used to improve the semantic segmentation such as curriculum, incremental or self-supervised learning. State-of-the-art SiS models rely on a large amount of annotated samples, which are more expensive to obtain than labels for tasks such as image classification. Since unlabeled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation (UDA) reached a broad success within the semantic segmentation community. Therefore, a second core contribution of this book is to summarize five years of a rapidly growing field, Domain Adaptation for Semantic Image Segmentation (DASiS) which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. In addition to providing a comprehensive survey on DASiS techniques, we unveil also newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation. Finally, we conclude this survey by describing datasets and benchmarks most widely used in SiS and DASiS and briefly discuss related tasks such as instance and panoptic image segmentation, as well as applications such as medical image segmentation.
语义图像分割(SiS)在广泛的计算机视觉应用中起着基础作用,为图像的全局理解提供关键信息。本调查旨在总结深度学习领域二十年来的研究,其中我们提出了从早期历史方法开始的解决方案的文献综述,然后概述了最近的深度学习方法,包括使用变压器的最新趋势。我们通过讨论弱监督和侧机器学习技术的特定案例来补充评论,这些技术可用于改进语义分割,如课程,增量或自监督学习。最先进的si模型依赖于大量带注释的样本,在图像分类等任务中,这些样本的获取成本比标签要高。由于未标记数据的获取成本要低得多,因此无监督域自适应(UDA)在语义分割领域取得广泛成功也就不足为奇了。因此,本书的第二个核心贡献是总结了五年来快速发展的领域,语义图像分割的领域适应(DASiS),它包含了语义分割本身的重要性以及使分割模型适应新环境的关键需求。除了提供对DASiS技术的全面调查外,我们还揭示了新的趋势,如多领域学习,领域泛化,领域增量学习,测试时间自适应和无源领域自适应。最后,我们描述了在si和DASiS中最广泛使用的数据集和基准,并简要讨论了相关的任务,如实例和全景图像分割,以及医学图像分割等应用。
{"title":"Semantic Image Segmentation: Two Decades of Research","authors":"G. Csurka, Riccardo Volpi, Boris Chidlovskii","doi":"10.1561/0600000095","DOIUrl":"https://doi.org/10.1561/0600000095","url":null,"abstract":"Semantic image segmentation (SiS) plays a fundamental role in a broad variety of computer vision applications, providing key information for the global understanding of an image. This survey is an effort to summarize two decades of research in the field of SiS, where we propose a literature review of solutions starting from early historical methods followed by an overview of more recent deep learning methods including the latest trend of using transformers. We complement the review by discussing particular cases of the weak supervision and side machine learning techniques that can be used to improve the semantic segmentation such as curriculum, incremental or self-supervised learning. State-of-the-art SiS models rely on a large amount of annotated samples, which are more expensive to obtain than labels for tasks such as image classification. Since unlabeled data is instead significantly cheaper to obtain, it is not surprising that Unsupervised Domain Adaptation (UDA) reached a broad success within the semantic segmentation community. Therefore, a second core contribution of this book is to summarize five years of a rapidly growing field, Domain Adaptation for Semantic Image Segmentation (DASiS) which embraces the importance of semantic segmentation itself and a critical need of adapting segmentation models to new environments. In addition to providing a comprehensive survey on DASiS techniques, we unveil also newer trends such as multi-domain learning, domain generalization, domain incremental learning, test-time adaptation and source-free domain adaptation. Finally, we conclude this survey by describing datasets and benchmarks most widely used in SiS and DASiS and briefly discuss related tasks such as instance and panoptic image segmentation, as well as applications such as medical image segmentation.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"8 1","pages":"1-162"},"PeriodicalIF":36.5,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87903398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learning-based Visual Compression 基于学习的视觉压缩
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.1561/0600000101
Ruolei Ji, Lina Karam
{"title":"Learning-based Visual Compression","authors":"Ruolei Ji, Lina Karam","doi":"10.1561/0600000101","DOIUrl":"https://doi.org/10.1561/0600000101","url":null,"abstract":"","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"32 1","pages":"1-112"},"PeriodicalIF":36.5,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75184914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Computational Imaging Through Atmospheric Turbulence 通过大气湍流的计算成像
Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2023-01-01 DOI: 10.1561/0600000103
Stanley H. Chan, Nicholas Chimitt
Seeing through a turbulent atmosphere has been one of the biggest challenges for ground-to-ground long-range incoherent imaging systems. The literature is very rich that can be dated back to Andrey Kolmogorov in the late 40’s, followed by a series of major developments by David Fried, Robert Noll, among others, during the 60’s and 70’s. However, even though we have a much better understanding of the atmosphere today, there remains a gap from the optics theory to image processing algorithms. In particular, training a deep neural network requires an accurate physical forward model that can synthesize training data at a large scale. Traditional wave propagation simulators are not an option here because they are computationally too expensive --- a 256x256 gray scale image would take several minutes to simulate.
{"title":"Computational Imaging Through Atmospheric Turbulence","authors":"Stanley H. Chan, Nicholas Chimitt","doi":"10.1561/0600000103","DOIUrl":"https://doi.org/10.1561/0600000103","url":null,"abstract":"Seeing through a turbulent atmosphere has been one of the biggest challenges for ground-to-ground long-range incoherent imaging systems. The literature is very rich that can be dated back to Andrey Kolmogorov in the late 40’s, followed by a series of major developments by David Fried, Robert Noll, among others, during the 60’s and 70’s. However, even though we have a much better understanding of the atmosphere today, there remains a gap from the optics theory to image processing algorithms. In particular, training a deep neural network requires an accurate physical forward model that can synthesize training data at a large scale. Traditional wave propagation simulators are not an option here because they are computationally too expensive --- a 256x256 gray scale image would take several minutes to simulate.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135262457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-Language Pre-training: Basics, Recent Advances, and Future Trends 视觉语言预训练:基础、最新进展和未来趋势
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-10-17 DOI: 10.48550/arXiv.2210.09263
Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao
This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; ($ii$) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and ($iii$) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies. In addition, for each category, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.
本文综述了近年来发展起来的多模态智能的视觉语言预训练方法。我们将这些方法分为三类:(i) VLP用于图像-文本任务,如图像字幕、图像-文本检索、视觉问答和视觉基础;($ii$) VLP用于核心计算机视觉任务,如(开集)图像分类、目标检测和分割;($iii$) VLP用于视频文本任务,如视频字幕、视频文本检索和视频问答。对于每个类别,我们都对最先进的方法进行了全面的回顾,并讨论了已经取得的进展和仍然面临的挑战,使用特定的系统和模型作为案例研究。此外,对于每个类别,我们讨论了在研究界正在积极探索的高级主题,例如大基础模型,统一建模,上下文少量学习,知识,鲁棒性和野外计算机视觉,等等。
{"title":"Vision-Language Pre-training: Basics, Recent Advances, and Future Trends","authors":"Zhe Gan, Linjie Li, Chunyuan Li, Lijuan Wang, Zicheng Liu, Jianfeng Gao","doi":"10.48550/arXiv.2210.09263","DOIUrl":"https://doi.org/10.48550/arXiv.2210.09263","url":null,"abstract":"This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; ($ii$) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and ($iii$) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies. In addition, for each category, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"9 1","pages":"163-352"},"PeriodicalIF":36.5,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74656714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Towards Better User Studies in Computer Graphics and Vision 迈向更好的计算机图形学和视觉用户研究
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-06-23 DOI: 10.1561/0600000106
Z. Bylinskii, L. Herman, Aaron Hertzmann, Stefanie Hutka, Yile Zhang
Online crowdsourcing platforms have made it increasingly easy to perform evaluations of algorithm outputs with survey questions like"which image is better, A or B?", leading to their proliferation in vision and graphics research papers. Results of these studies are often used as quantitative evidence in support of a paper's contributions. On the one hand we argue that, when conducted hastily as an afterthought, such studies lead to an increase of uninformative, and, potentially, misleading conclusions. On the other hand, in these same communities, user research is underutilized in driving project direction and forecasting user needs and reception. We call for increased attention to both the design and reporting of user studies in computer vision and graphics papers towards (1) improved replicability and (2) improved project direction. Together with this call, we offer an overview of methodologies from user experience research (UXR), human-computer interaction (HCI), and applied perception to increase exposure to the available methodologies and best practices. We discuss foundational user research methods (e.g., needfinding) that are presently underutilized in computer vision and graphics research, but can provide valuable project direction. We provide further pointers to the literature for readers interested in exploring other UXR methodologies. Finally, we describe broader open issues and recommendations for the research community.
在线众包平台使得对算法输出进行评估变得越来越容易,比如“哪个图像更好,A还是B?”这样的调查问题,导致它们在视觉和图形研究论文中激增。这些研究的结果通常被用作支持论文贡献的定量证据。一方面,我们认为,如果事后仓促进行这样的研究,会导致缺乏信息的结论增加,而且可能会导致误导性的结论。另一方面,在这些社区中,用户研究在推动项目方向和预测用户需求和接收方面没有得到充分利用。我们呼吁增加对计算机视觉和图形论文中用户研究的设计和报告的关注,以实现(1)改进可复制性和(2)改进项目方向。与本次电话会议一起,我们提供了用户体验研究(UXR),人机交互(HCI)和应用感知的方法概述,以增加对可用方法和最佳实践的了解。我们讨论了目前在计算机视觉和图形研究中未充分利用的基本用户研究方法(例如,需求发现),但可以提供有价值的项目方向。我们为有兴趣探索其他UXR方法的读者提供了进一步的文献指向。最后,我们描述了更广泛的开放问题和对研究界的建议。
{"title":"Towards Better User Studies in Computer Graphics and Vision","authors":"Z. Bylinskii, L. Herman, Aaron Hertzmann, Stefanie Hutka, Yile Zhang","doi":"10.1561/0600000106","DOIUrl":"https://doi.org/10.1561/0600000106","url":null,"abstract":"Online crowdsourcing platforms have made it increasingly easy to perform evaluations of algorithm outputs with survey questions like\"which image is better, A or B?\", leading to their proliferation in vision and graphics research papers. Results of these studies are often used as quantitative evidence in support of a paper's contributions. On the one hand we argue that, when conducted hastily as an afterthought, such studies lead to an increase of uninformative, and, potentially, misleading conclusions. On the other hand, in these same communities, user research is underutilized in driving project direction and forecasting user needs and reception. We call for increased attention to both the design and reporting of user studies in computer vision and graphics papers towards (1) improved replicability and (2) improved project direction. Together with this call, we offer an overview of methodologies from user experience research (UXR), human-computer interaction (HCI), and applied perception to increase exposure to the available methodologies and best practices. We discuss foundational user research methods (e.g., needfinding) that are presently underutilized in computer vision and graphics research, but can provide valuable project direction. We provide further pointers to the literature for readers interested in exploring other UXR methodologies. Finally, we describe broader open issues and recommendations for the research community.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"106 1","pages":"201-252"},"PeriodicalIF":36.5,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80690753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Introduction to Neural Data Compression 神经数据压缩导论
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-02-14 DOI: 10.1561/0600000107
Yibo Yang, S. Mandt, Lucas Theis
Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. The present article aims to introduce this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far.
神经压缩是神经网络和其他机器学习方法在数据压缩中的应用。统计机器学习的最新进展为数据压缩开辟了新的可能性,允许使用强大的生成模型(如归一化流、变分自编码器、扩散概率模型和生成对抗网络)从数据端到端学习压缩算法。本文旨在通过回顾信息论(例如,熵编码,率失真理论)和计算机视觉(例如,图像质量评估,感知度量)的必要背景,并通过迄今为止文献中的基本思想和方法提供策划指南,向更广泛的机器学习受众介绍这一研究领域。
{"title":"An Introduction to Neural Data Compression","authors":"Yibo Yang, S. Mandt, Lucas Theis","doi":"10.1561/0600000107","DOIUrl":"https://doi.org/10.1561/0600000107","url":null,"abstract":"Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. The present article aims to introduce this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"65 1","pages":"113-200"},"PeriodicalIF":36.5,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79559107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Deep Learning for Image/Video Restoration and Super-resolution 图像/视频恢复和超分辨率的深度学习
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2022-01-01 DOI: 10.1561/0600000100
A. Tekalp
{"title":"Deep Learning for Image/Video Restoration and Super-resolution","authors":"A. Tekalp","doi":"10.1561/0600000100","DOIUrl":"https://doi.org/10.1561/0600000100","url":null,"abstract":"","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"1 1","pages":"1-110"},"PeriodicalIF":36.5,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91158980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Learning for Multimedia Forensics 多媒体取证的深度学习
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2021-01-01 DOI: 10.1561/0600000096
Irene Amerini, A. Anagnostopoulos, Luca Maiano, L. R. Celsi
In the last two decades, we have witnessed an immense increase in the use of multimedia content on the internet, for multiple applications ranging from the most innocuous to very critical ones. Naturally, this emergence has given rise to many types of threats posed when this content can be manipulated/used for malicious purposes. For example, fake media can be used to drive personal opinions, ruining the image of a public figure, or for criminal activities such as terrorist propaganda and cyberbullying. The research community has of course moved to counter attack these threats by designing manipulation-detection systems based on a variety of techniques, such as signal processing, statistics, and machine learning. This research and practice activity has given rise to the field of multimedia forensics. The success of deep learning in the last decade has led to its use in multimedia forensics as well. In this survey, we look at the latest trends and deep-learning-based techniques introduced to solve three main questions investigated in the field of multimedia forensics. We begin by examining the manipulations of images and videos produced with editing tools, reporting the deep-learning approaches adopted to Irene Amerini, Aris Anagnostopoulos, Luca Maiano and Lorenzo Ricciardi Celsi (2021), “Deep Learning for Multimedia Forensics”, Foundations and Trends® in Computer Graphics and Vision: Vol. 12, No. 4, pp 309–457. DOI: 10.1561/0600000096. Full text available at: http://dx.doi.org/10.1561/0600000096
在过去的二十年里,我们目睹了互联网上多媒体内容使用的巨大增长,从最无害的应用到非常重要的应用。当然,当这些内容可以被操纵/用于恶意目的时,这种出现会带来许多类型的威胁。例如,假媒体可以用来推动个人观点,破坏公众人物的形象,或者用于恐怖主义宣传和网络欺凌等犯罪活动。当然,研究界已经开始通过设计基于各种技术(如信号处理、统计学和机器学习)的操纵检测系统来反击这些威胁。这种研究和实践活动催生了多媒体取证领域。在过去十年中,深度学习的成功也导致了它在多媒体取证中的应用。在本调查中,我们将介绍最新趋势和基于深度学习的技术,以解决多媒体取证领域的三个主要问题。我们首先检查使用编辑工具制作的图像和视频的操作,报告Irene Amerini, Aris Anagnostopoulos, Luca Maiano和Lorenzo Ricciardi Celsi(2021)采用的深度学习方法,“多媒体取证的深度学习”,计算机图形学和视觉的基础和趋势®:第12卷,第4期,第309-457页。DOI: 10.1561 / 0600000096。全文可在:http://dx.doi.org/10.1561/0600000096
{"title":"Deep Learning for Multimedia Forensics","authors":"Irene Amerini, A. Anagnostopoulos, Luca Maiano, L. R. Celsi","doi":"10.1561/0600000096","DOIUrl":"https://doi.org/10.1561/0600000096","url":null,"abstract":"In the last two decades, we have witnessed an immense increase in the use of multimedia content on the internet, for multiple applications ranging from the most innocuous to very critical ones. Naturally, this emergence has given rise to many types of threats posed when this content can be manipulated/used for malicious purposes. For example, fake media can be used to drive personal opinions, ruining the image of a public figure, or for criminal activities such as terrorist propaganda and cyberbullying. The research community has of course moved to counter attack these threats by designing manipulation-detection systems based on a variety of techniques, such as signal processing, statistics, and machine learning. This research and practice activity has given rise to the field of multimedia forensics. The success of deep learning in the last decade has led to its use in multimedia forensics as well. In this survey, we look at the latest trends and deep-learning-based techniques introduced to solve three main questions investigated in the field of multimedia forensics. We begin by examining the manipulations of images and videos produced with editing tools, reporting the deep-learning approaches adopted to Irene Amerini, Aris Anagnostopoulos, Luca Maiano and Lorenzo Ricciardi Celsi (2021), “Deep Learning for Multimedia Forensics”, Foundations and Trends® in Computer Graphics and Vision: Vol. 12, No. 4, pp 309–457. DOI: 10.1561/0600000096. Full text available at: http://dx.doi.org/10.1561/0600000096","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"34 1","pages":"309-457"},"PeriodicalIF":36.5,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81106697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Discrete Graphical Models - An Optimization Perspective 离散图形模型-优化视角
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2019-12-09 DOI: 10.1561/0600000084
Bogdan Savchynskyy
This monograph is about discrete energy minimization for discrete graphical models. It considers graphical models, or, more precisely, maximum a posteriori inference for graphical models, purely as a combinatorial optimization problem. Modeling, applications, probabilistic interpretations and many other aspects are either ignored here or find their place in examples and remarks only. It covers the integer linear programming formulation of the problem as well as its linear programming, Lagrange and Lagrange decomposition-based relaxations. In particular, it provides a detailed analysis of the polynomially solvable acyclic and submodular problems, along with the corresponding exact optimization methods. Major approximate methods, such as message passing and graph cut techniques are also described and analyzed comprehensively. The monograph can be useful for undergraduate and graduate students studying optimization or graphical models, as well as for experts in optimization who want to have a look into graphical models. To make the monograph suitable for both categories of readers we explicitly separate the mathematical optimization background chapters from those specific to graphical models.
本专著是关于离散图形模型的离散能量最小化。它考虑图形模型,或者更准确地说,图形模型的最大后验推理,纯粹作为组合优化问题。建模、应用、概率解释和许多其他方面在这里要么被忽略,要么只在示例和注释中找到它们的位置。它涵盖了问题的整数线性规划公式,以及它的线性规划,拉格朗日和基于拉格朗日分解的松弛。特别地,它提供了多项式可解的无环和次模问题的详细分析,以及相应的精确优化方法。对主要的近似方法,如消息传递和图切技术进行了全面的描述和分析。对于学习优化或图形模型的本科生和研究生,以及想要研究图形模型的优化专家,本专著非常有用。为了使专著适合这两类读者,我们明确地将数学优化背景章节与那些特定于图形模型的章节分开。
{"title":"Discrete Graphical Models - An Optimization Perspective","authors":"Bogdan Savchynskyy","doi":"10.1561/0600000084","DOIUrl":"https://doi.org/10.1561/0600000084","url":null,"abstract":"This monograph is about discrete energy minimization for discrete graphical models. It considers graphical models, or, more precisely, maximum a posteriori inference for graphical models, purely as a combinatorial optimization problem. Modeling, applications, probabilistic interpretations and many other aspects are either ignored here or find their place in examples and remarks only. It covers the integer linear programming formulation of the problem as well as its linear programming, Lagrange and Lagrange decomposition-based relaxations. In particular, it provides a detailed analysis of the polynomially solvable acyclic and submodular problems, along with the corresponding exact optimization methods. Major approximate methods, such as message passing and graph cut techniques are also described and analyzed comprehensively. The monograph can be useful for undergraduate and graduate students studying optimization or graphical models, as well as for experts in optimization who want to have a look into graphical models. To make the monograph suitable for both categories of readers we explicitly separate the mathematical optimization background chapters from those specific to graphical models.","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"85 1","pages":"160-429"},"PeriodicalIF":36.5,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85411678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Publishing and Consuming 3D Content on the Web: A Survey 在网络上发布和消费3D内容:一项调查
IF 36.5 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2018-12-13 DOI: 10.1561/0600000083
Marco Potenziani, M. Callieri, M. Dellepiane, Roberto Scopigno
Three-dimensional content is becoming an important component of the World Wide Web environment. From the advent of WebGL to the present, a wide number of solutions have been developed (including libraries, middleware, and applications), encouraging the establishment of 3D data as online media of practical use. The fast development of 3D technologies and related web-based resources makes it difficult to identify and properly understand the current trends and open issues. Starting from these premises, this survey analyzes the state of the art of 3D web publishing, reviews the possibilities provided by the major current approaches, proposes a categorization of the features supported by existing solutions, and cross-maps these with the requirements of a few main application domains. The results of this analysis should help in defining the technical characteristics needed to build efficient and effective 3D data presentation, taking into account the application contexts. Marco Potenziani, Marco Callieri, Matteo Dellepiane and Roberto Scopigno (2018), “Publishing and Consuming 3D Content on the Web: A Survey”, Foundations and Trends © in Computer Graphics and Vision: Vol. 10, No. 4, pp 244–333. DOI: 10.1561/0600000083. The version of record is available at: http://dx.doi.org/10.1561/0600000083
三维内容正在成为万维网环境的重要组成部分。从WebGL的出现到现在,已经开发了大量的解决方案(包括库、中间件和应用程序),鼓励将3D数据建立为实际使用的在线媒体。3D技术和相关网络资源的快速发展使得识别和正确理解当前趋势和开放问题变得困难。从这些前提出发,本调查分析了3D网络出版的现状,回顾了当前主要方法提供的可能性,提出了现有解决方案支持的功能分类,并将这些功能与几个主要应用领域的需求进行了交叉映射。该分析的结果将有助于定义构建高效和有效的3D数据表示所需的技术特征,同时考虑到应用环境。Marco Potenziani, Marco Callieri, Matteo Dellepiane和Roberto Scopigno(2018),“在网络上发布和消费3D内容:一项调查”,《计算机图形学与视觉基础与趋势©》,Vol. 10, No. 4, pp 244-333。DOI: 10.1561 / 0600000083。记录的版本可在:http://dx.doi.org/10.1561/0600000083
{"title":"Publishing and Consuming 3D Content on the Web: A Survey","authors":"Marco Potenziani, M. Callieri, M. Dellepiane, Roberto Scopigno","doi":"10.1561/0600000083","DOIUrl":"https://doi.org/10.1561/0600000083","url":null,"abstract":"Three-dimensional content is becoming an important component of the World Wide Web environment. From the advent of WebGL to the present, a wide number of solutions have been developed (including libraries, middleware, and applications), encouraging the establishment of 3D data as online media of practical use. The fast development of 3D technologies and related web-based resources makes it difficult to identify and properly understand the current trends and open issues. Starting from these premises, this survey analyzes the state of the art of 3D web publishing, reviews the possibilities provided by the major current approaches, proposes a categorization of the features supported by existing solutions, and cross-maps these with the requirements of a few main application domains. The results of this analysis should help in defining the technical characteristics needed to build efficient and effective 3D data presentation, taking into account the application contexts. Marco Potenziani, Marco Callieri, Matteo Dellepiane and Roberto Scopigno (2018), “Publishing and Consuming 3D Content on the Web: A Survey”, Foundations and Trends © in Computer Graphics and Vision: Vol. 10, No. 4, pp 244–333. DOI: 10.1561/0600000083. The version of record is available at: http://dx.doi.org/10.1561/0600000083","PeriodicalId":45662,"journal":{"name":"Foundations and Trends in Computer Graphics and Vision","volume":"34 1","pages":"244-333"},"PeriodicalIF":36.5,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73068033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Foundations and Trends in Computer Graphics and Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1