首页 > 最新文献

Proceedings of the 2023 6th International Conference on Machine Vision and Applications最新文献

英文 中文
Digital Holography vs. Display Holography - What are their differences and what do they have in common? 数字全息术与显示全息术——它们的区别和共同点是什么?
S. Reichelt, G. Pedrini
Holography is a two-stage imaging process in which the wave field of the object is recorded in a first step so that it can be reconstructed in a second step. It involves the physics of diffraction and interference to record and reconstruct optical wavefields or 3D objects. The connecting element between the recording and reconstruction stages is the hologram itself in which the holographic code is stored. While in the early days holography was a purely experimental and analog technique, the building blocks of holography were later digitalized step by step. Holograms were first simulated and later reconstructed by computer, the hologram storage medium became discretized optical elements, pixelated sensors, and light modulators. Due to different approaches and use cases, the language of holography has evolved in diverse and sometimes confusing ways. In this paper, we address the differences and similarities between digital holography and display holography. Both techniques are digital, but their meanings in the community are sometimes different. In general and common understanding, the term digital holography (DH) refers to a digital hologram recording of a wave field emanating from a 3D object, followed by a numerical reconstruction of that object. On the contrary, the term computer-generated display holography (CGDH) describes the numerical calculation of the hologram and its physical representation, followed by an experimental reconstruction of the 3D object by optical means. Thus, it is the purpose that distinguishes the two techniques: digital holograms are used to numerically reconstruct and measure previously captured (unknown) objects or object changes, whereas computer-generated display holograms are utilized to visualize (known) 3D objects or scenes in a way that best mimics natural vision. The purpose of this paper is to clarify the terminology of holography, contrasting digital holography and computer-generated display holography. In particular, we will explain how each method works, emphasize their specific characteristics and mention how they are used in different applications. We will also provide some examples of how the two technologies are used.
全息成像是一种两阶段成像过程,在第一步中记录物体的波场,以便在第二步中重建物体。它涉及到衍射和干涉的物理记录和重建光波场或三维物体。记录和重建阶段之间的连接元件是存储全息码的全息图本身。而在早期,全息摄影是一种纯粹的实验和模拟技术,全息摄影的基石后来一步一步地数字化。全息图首先是模拟的,然后用计算机重建,全息图存储介质成为离散光学元件、像素化传感器和光调制器。由于不同的方法和用例,全息术的语言以不同的、有时令人困惑的方式发展。在本文中,我们讨论了数字全息和显示全息的异同。这两种技术都是数字化的,但它们在社区中的意义有时是不同的。在一般和共同的理解中,术语数字全息(DH)是指从3D物体发出的波场的数字全息记录,然后对该物体进行数值重建。相反,术语计算机生成显示全息(CGDH)描述了全息图的数值计算及其物理表示,然后通过光学手段对三维物体进行实验重建。因此,区分这两种技术的目的是:数字全息图用于数字重建和测量先前捕获的(未知)物体或物体变化,而计算机生成的显示全息图用于可视化(已知)3D物体或场景,以最好的方式模仿自然视觉。本文的目的是澄清全息术的术语,对比数字全息术和计算机生成显示全息术。特别是,我们将解释每种方法的工作原理,强调它们的具体特征,并提及它们如何在不同的应用程序中使用。我们还将提供一些如何使用这两种技术的示例。
{"title":"Digital Holography vs. Display Holography - What are their differences and what do they have in common?","authors":"S. Reichelt, G. Pedrini","doi":"10.1145/3589572.3589583","DOIUrl":"https://doi.org/10.1145/3589572.3589583","url":null,"abstract":"Holography is a two-stage imaging process in which the wave field of the object is recorded in a first step so that it can be reconstructed in a second step. It involves the physics of diffraction and interference to record and reconstruct optical wavefields or 3D objects. The connecting element between the recording and reconstruction stages is the hologram itself in which the holographic code is stored. While in the early days holography was a purely experimental and analog technique, the building blocks of holography were later digitalized step by step. Holograms were first simulated and later reconstructed by computer, the hologram storage medium became discretized optical elements, pixelated sensors, and light modulators. Due to different approaches and use cases, the language of holography has evolved in diverse and sometimes confusing ways. In this paper, we address the differences and similarities between digital holography and display holography. Both techniques are digital, but their meanings in the community are sometimes different. In general and common understanding, the term digital holography (DH) refers to a digital hologram recording of a wave field emanating from a 3D object, followed by a numerical reconstruction of that object. On the contrary, the term computer-generated display holography (CGDH) describes the numerical calculation of the hologram and its physical representation, followed by an experimental reconstruction of the 3D object by optical means. Thus, it is the purpose that distinguishes the two techniques: digital holograms are used to numerically reconstruct and measure previously captured (unknown) objects or object changes, whereas computer-generated display holograms are utilized to visualize (known) 3D objects or scenes in a way that best mimics natural vision. The purpose of this paper is to clarify the terminology of holography, contrasting digital holography and computer-generated display holography. In particular, we will explain how each method works, emphasize their specific characteristics and mention how they are used in different applications. We will also provide some examples of how the two technologies are used.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"105 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131893604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Self-Imposed Constraints on RGB and LiDAR for Unsupervised Training 利用RGB和LiDAR的自我约束进行无监督训练
Andreas Hubert, Janis Jung, Konrad Doll
Hand detection on single images is an intensively researched area, and reasonable solutions are already available today. However, fine-tuning detectors within a specific domain remains a tedious task. Unsupervised training procedures can reduce the effort required to create domain-specific datasets and models. In addition, different modalities of the same physical space, here color and depth data, represent objects differently and thus allow for exploitation. We introduce and evaluate a training pipeline to exploit the modalities in an unsupervised manner. The supervision is omitted by choosing suitable self-imposed constraints for the data source. We compare our training results with ground truth training results and show that with these modalities, the domain can be extended without a single annotation, e.g., for detecting colored gloves.
单幅图像上的手部检测是一个深入研究的领域,目前已经有了合理的解决方案。然而,在特定领域内对检测器进行微调仍然是一项繁琐的任务。无监督的训练过程可以减少创建特定领域数据集和模型所需的工作量。此外,同一物理空间的不同模态,这里的颜色和深度数据,以不同的方式表示对象,从而允许利用。我们引入并评估了一个培训管道,以无监督的方式利用这些模式。通过为数据源选择合适的自我强加约束,可以省略监督。我们将我们的训练结果与地面真值训练结果进行了比较,并表明使用这些模态,可以在没有单个注释的情况下扩展域,例如用于检测有色手套。
{"title":"Exploiting Self-Imposed Constraints on RGB and LiDAR for Unsupervised Training","authors":"Andreas Hubert, Janis Jung, Konrad Doll","doi":"10.1145/3589572.3589575","DOIUrl":"https://doi.org/10.1145/3589572.3589575","url":null,"abstract":"Hand detection on single images is an intensively researched area, and reasonable solutions are already available today. However, fine-tuning detectors within a specific domain remains a tedious task. Unsupervised training procedures can reduce the effort required to create domain-specific datasets and models. In addition, different modalities of the same physical space, here color and depth data, represent objects differently and thus allow for exploitation. We introduce and evaluate a training pipeline to exploit the modalities in an unsupervised manner. The supervision is omitted by choosing suitable self-imposed constraints for the data source. We compare our training results with ground truth training results and show that with these modalities, the domain can be extended without a single annotation, e.g., for detecting colored gloves.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133304415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-Enhanced Translation from PET to CT Modality with Paired GANs 配对gan从PET到CT模态的结构增强转换
Tasnim Ahmed, Ahnaf Munir, Sabbir Ahmed, Md. Bakhtiar Hasan, Md. Taslim Reza, M. H. Kabir
Computed Tomography (CT) images play a crucial role in medical diagnosis and treatment planning. However, acquiring CT images can be difficult in certain scenarios, such as patients inability to undergo radiation exposure or unavailability of CT scanner. An alternative solution can be generating CT images from other imaging modalities. In this work, we propose a medical image translation pipeline for generating high-quality CT images from Positron Emission Tomography (PET) images using a Pix2Pix Generative Adversarial Network (GAN), which are effective in image translation tasks. However, traditional GAN loss functions often fail to capture the structural similarity between generated and target image. To alleviate this issue, we introduce a Multi-Scale Structural Similarity Index Measure (MS-SSIM) loss in addition to the GAN loss to ensure that the generated images preserve the anatomical structures and patterns present in the real CT images. Experiments on the ‘QIN-Breast’ dataset demonstrate that our proposed architecture achieves a Peak Signal-to-Noise Ratio (PSNR) of 17.70 dB and a Structural Similarity Index Measure (SSIM) of 42.51% in the region of interest.
计算机断层扫描(CT)图像在医学诊断和治疗计划中起着至关重要的作用。然而,在某些情况下,获取CT图像可能很困难,例如患者无法接受辐射暴露或无法使用CT扫描仪。另一种解决方案是从其他成像方式生成CT图像。在这项工作中,我们提出了一个医学图像翻译管道,用于使用Pix2Pix生成对抗网络(GAN)从正电子发射断层扫描(PET)图像生成高质量的CT图像,这在图像翻译任务中是有效的。然而,传统的GAN损失函数往往不能捕获生成图像与目标图像之间的结构相似性。为了缓解这一问题,除了GAN损失外,我们还引入了多尺度结构相似指数测量(MS-SSIM)损失,以确保生成的图像保留真实CT图像中存在的解剖结构和模式。在“QIN-Breast”数据集上的实验表明,我们提出的架构在感兴趣区域实现了17.70 dB的峰值信噪比(PSNR)和42.51%的结构相似指数度量(SSIM)。
{"title":"Structure-Enhanced Translation from PET to CT Modality with Paired GANs","authors":"Tasnim Ahmed, Ahnaf Munir, Sabbir Ahmed, Md. Bakhtiar Hasan, Md. Taslim Reza, M. H. Kabir","doi":"10.1145/3589572.3589593","DOIUrl":"https://doi.org/10.1145/3589572.3589593","url":null,"abstract":"Computed Tomography (CT) images play a crucial role in medical diagnosis and treatment planning. However, acquiring CT images can be difficult in certain scenarios, such as patients inability to undergo radiation exposure or unavailability of CT scanner. An alternative solution can be generating CT images from other imaging modalities. In this work, we propose a medical image translation pipeline for generating high-quality CT images from Positron Emission Tomography (PET) images using a Pix2Pix Generative Adversarial Network (GAN), which are effective in image translation tasks. However, traditional GAN loss functions often fail to capture the structural similarity between generated and target image. To alleviate this issue, we introduce a Multi-Scale Structural Similarity Index Measure (MS-SSIM) loss in addition to the GAN loss to ensure that the generated images preserve the anatomical structures and patterns present in the real CT images. Experiments on the ‘QIN-Breast’ dataset demonstrate that our proposed architecture achieves a Peak Signal-to-Noise Ratio (PSNR) of 17.70 dB and a Structural Similarity Index Measure (SSIM) of 42.51% in the region of interest.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116082420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparison of Face Recognition on RGB and Grayscale Color with Deep Learning in Forensic Science 基于深度学习的RGB和灰度颜色人脸识别在法医学中的比较
Phornvipa Werukanjana, Prush Sa-nga-ngam, Norapattra Permpool
In forensic science face recognition, we cannot request high-quality face images from sources, but we have face images from CCTV grayscale on the crime scene at night, face images in RGB mode from Web Cameras, etc. This research needs to find a satisfying method of face recognition in forensic science to identify the “Who's face?” at the request of a police investigator. The experiment uses Siamese neural network face recognition of both RGB and GRAY color modes to compare and show the performance of both color modes. The evaluation shows a confusion matrix, F1-score ROC/AUC, and a strong recommend with Likelihood ratio (LR) that supports court in evidence identification recommended by NIST and ENFSI.
在法医学人脸识别中,我们无法从数据源中获取高质量的人脸图像,但我们可以从夜间犯罪现场的闭路电视灰度图像中获取人脸图像,从网络摄像机中获取RGB模式的人脸图像等。本研究需要在法医学中找到一种令人满意的人脸识别方法来识别“谁的脸?”应警方调查员的要求。实验采用RGB和GRAY两种颜色模式下的Siamese神经网络人脸识别来比较和展示两种颜色模式的性能。评估显示了一个混淆矩阵,f1分ROC/AUC,以及一个强有力的似然比(LR)推荐,支持法院在NIST和ENFSI推荐的证据鉴定。
{"title":"Comparison of Face Recognition on RGB and Grayscale Color with Deep Learning in Forensic Science","authors":"Phornvipa Werukanjana, Prush Sa-nga-ngam, Norapattra Permpool","doi":"10.1145/3589572.3589586","DOIUrl":"https://doi.org/10.1145/3589572.3589586","url":null,"abstract":"In forensic science face recognition, we cannot request high-quality face images from sources, but we have face images from CCTV grayscale on the crime scene at night, face images in RGB mode from Web Cameras, etc. This research needs to find a satisfying method of face recognition in forensic science to identify the “Who's face?” at the request of a police investigator. The experiment uses Siamese neural network face recognition of both RGB and GRAY color modes to compare and show the performance of both color modes. The evaluation shows a confusion matrix, F1-score ROC/AUC, and a strong recommend with Likelihood ratio (LR) that supports court in evidence identification recommended by NIST and ENFSI.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"128 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120927377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-Demand Multiclass Imaging for Sample Scarcity in Industrial Environments 工业环境中样品稀缺性的按需多类成像
Joan Orti, F. Moreno-Noguer, V. Puig
While technology pushes towards controlling more and more complex industrial processes, data related issues are still a non-trivial problem to address. In this sense, class imbalances and scarcity of data occupy a lot of time and resources when designing a solution. In the surface defect detection problem, due to the random nature of the process, both situations are very common as well as a general decompensation between the image size and the defect size. In this work, we address a segmentation and classification problem with very few available images from every class, proposing a two-step process. First, by generating fake images using the guided-crop image augmentation method, we train for every single class a Pix2pix model in order to perform a mask-to-image translation. Once the model is trained, we also designed a automatic mask generator, to mimic the shapes of the dataset and thus create real-like images for every class using the pretrained networks. Eventually, using a context aggregation network, we use these fake images as our training set, changing every certain epochs the amount of images of every class on-demand, depending on the evolution of the individual loss term of every class. As a result, we accomplished stable and robust segmentation and classification metrics, regardless of the amount of data available for training, using the NEU Micro surface defect database.
虽然技术推动着控制越来越复杂的工业过程,但与数据相关的问题仍然是一个不容忽视的问题。从这个意义上说,在设计解决方案时,类的不平衡和数据的稀缺性占用了大量的时间和资源。在表面缺陷检测问题中,由于过程的随机性,这两种情况都很常见,并且图像尺寸与缺陷尺寸之间存在普遍的失补偿。在这项工作中,我们解决了每个类别中很少可用图像的分割和分类问题,提出了一个两步过程。首先,通过使用引导裁剪图像增强方法生成假图像,我们为每个类训练一个Pix2pix模型,以便执行蒙版到图像的转换。一旦模型被训练,我们还设计了一个自动掩码生成器,来模仿数据集的形状,从而使用预训练的网络为每个类创建类似真实的图像。最后,使用上下文聚合网络,我们使用这些假图像作为我们的训练集,根据每个类的个体损失项的演变,在每个特定的时代按需改变每个类的图像数量。结果,我们完成了稳定和健壮的分割和分类度量,不管训练可用的数据量有多少,使用NEU Micro表面缺陷数据库。
{"title":"On-Demand Multiclass Imaging for Sample Scarcity in Industrial Environments","authors":"Joan Orti, F. Moreno-Noguer, V. Puig","doi":"10.1145/3589572.3589573","DOIUrl":"https://doi.org/10.1145/3589572.3589573","url":null,"abstract":"While technology pushes towards controlling more and more complex industrial processes, data related issues are still a non-trivial problem to address. In this sense, class imbalances and scarcity of data occupy a lot of time and resources when designing a solution. In the surface defect detection problem, due to the random nature of the process, both situations are very common as well as a general decompensation between the image size and the defect size. In this work, we address a segmentation and classification problem with very few available images from every class, proposing a two-step process. First, by generating fake images using the guided-crop image augmentation method, we train for every single class a Pix2pix model in order to perform a mask-to-image translation. Once the model is trained, we also designed a automatic mask generator, to mimic the shapes of the dataset and thus create real-like images for every class using the pretrained networks. Eventually, using a context aggregation network, we use these fake images as our training set, changing every certain epochs the amount of images of every class on-demand, depending on the evolution of the individual loss term of every class. As a result, we accomplished stable and robust segmentation and classification metrics, regardless of the amount of data available for training, using the NEU Micro surface defect database.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117045498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recovering Image Information from Speckle Noise by Image Processing 用图像处理方法从散斑噪声中恢复图像信息
Jianlin Nie, S. Hanson, M. Takeda, Wen Wang
As a kind of noise, speckle seriously affects the imaging quality of optical imaging system. However, the speckle image carries a large amount of information related to the physical characteristics of the object surface, which can be used as the basis to identify and judge hidden objects. In this paper, speckle noise removal in optical imaging is studied. The average is derived for the squared moduli of spectra of short-exposure speckle images to recover the amplitude information. At the same time, cross-spectrum function is used to recover the phase information. We use this method to process the images. Then, the simulation experiment analysis is carried out by varying two aspects: the stacking numbers and the different objects. The results show that this method can recover the feature information from the speckle image, thus verifying the feasibility of the method.
散斑作为一种噪声,严重影响光学成像系统的成像质量。而散斑图像中携带着大量与物体表面物理特征相关的信息,可以作为识别和判断隐藏物体的依据。本文对光学成像中的散斑噪声去除问题进行了研究。对短曝光散斑图像的光谱平方模量求平均值,恢复振幅信息。同时,利用交叉谱函数恢复相位信息。我们用这种方法对图像进行处理。然后,从叠加数和不同目标两方面进行了仿真实验分析。结果表明,该方法可以从散斑图像中恢复出特征信息,从而验证了该方法的可行性。
{"title":"Recovering Image Information from Speckle Noise by Image Processing","authors":"Jianlin Nie, S. Hanson, M. Takeda, Wen Wang","doi":"10.1145/3589572.3589581","DOIUrl":"https://doi.org/10.1145/3589572.3589581","url":null,"abstract":"As a kind of noise, speckle seriously affects the imaging quality of optical imaging system. However, the speckle image carries a large amount of information related to the physical characteristics of the object surface, which can be used as the basis to identify and judge hidden objects. In this paper, speckle noise removal in optical imaging is studied. The average is derived for the squared moduli of spectra of short-exposure speckle images to recover the amplitude information. At the same time, cross-spectrum function is used to recover the phase information. We use this method to process the images. Then, the simulation experiment analysis is carried out by varying two aspects: the stacking numbers and the different objects. The results show that this method can recover the feature information from the speckle image, thus verifying the feasibility of the method.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121776210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision 面向抓取的细粒度布料分割,无需实际监督
Ruijie Ren, Mohit Gurnani Rajesh, Jordi Sanchez-Riera, Fan Zhang, Yurun Tian, Antonio Agudo, Y. Demiris, K. Mikolajczyk, F. Moreno-Noguer
Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognizing larger segments, such as semantic edges that carry more information than points. We thus first tackle the problem of fine-grained region detection in deformed clothes using only a depth image. We implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-Net based network to segment and label these parts. Our second contribution is concerned with the level of supervision required to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we propose a multilayered Domain Adaptation strategy that does not use any real annotations. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels, and show that training our network only with synthetic labels and our proposed DA approach yields results competitive with real data supervision.
从单一深度图像中自动检测可抓取区域是布料处理的关键因素。布料变形的巨大可变性促使大多数当前的方法专注于识别特定的抓取点,而不是语义部分,因为局部区域的外观和深度变化比大区域更小,更容易建模。然而,像折叠布料或辅助穿衣这样的任务需要识别更大的片段,比如比点携带更多信息的语义边缘。因此,我们首先解决了仅使用深度图像在变形衣服中进行细粒度区域检测的问题。我们为t恤实现了一种方法,并定义了多达6个不同程度的语义区域,包括领口、袖口和下摆的边缘,以及顶部和底部的抓点。我们引入了一个基于U-Net的网络来对这些部分进行分段和标记。我们的第二个贡献与训练所提议的网络所需的监督水平有关。虽然大多数方法通过结合真实和合成注释来学习检测抓取点,但在这项工作中,我们提出了一种不使用任何真实注释的多层领域自适应策略。我们在带有细粒度标签的t恤的真实深度图像上彻底评估了我们的方法,并表明仅使用合成标签和我们提出的DA方法训练我们的网络产生与真实数据监督相竞争的结果。
{"title":"Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision","authors":"Ruijie Ren, Mohit Gurnani Rajesh, Jordi Sanchez-Riera, Fan Zhang, Yurun Tian, Antonio Agudo, Y. Demiris, K. Mikolajczyk, F. Moreno-Noguer","doi":"10.1145/3589572.3589594","DOIUrl":"https://doi.org/10.1145/3589572.3589594","url":null,"abstract":"Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognizing larger segments, such as semantic edges that carry more information than points. We thus first tackle the problem of fine-grained region detection in deformed clothes using only a depth image. We implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-Net based network to segment and label these parts. Our second contribution is concerned with the level of supervision required to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we propose a multilayered Domain Adaptation strategy that does not use any real annotations. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels, and show that training our network only with synthetic labels and our proposed DA approach yields results competitive with real data supervision.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125169882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Proceedings of the 2023 6th International Conference on Machine Vision and Applications 2023年第六届机器视觉与应用国际会议论文集
{"title":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","authors":"","doi":"10.1145/3589572","DOIUrl":"https://doi.org/10.1145/3589572","url":null,"abstract":"","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132793730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 2023 6th International Conference on Machine Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1