首页 > 最新文献

IPSJ Transactions on Computer Vision and Applications最新文献

英文 中文
Robust Feature Matching for Distorted Projection by Spherical Cameras 球面相机畸变投影的鲁棒特征匹配
Q1 Computer Science Pub Date : 2015-01-01 DOI: 10.2197/ipsjtcva.7.84
Hajime Taira, Yuki Inoue, A. Torii, M. Okutomi
In this work, we proposes a simple yet effective method for improving performance of local feature matching among equirectangular cylindrical images, which brings more stable and complete 3D reconstruction by incremental SfM. The key idea is to exiplictly generate synthesized images by rotating the spherical panoramic images and to detect and describe features only from the less distroted area in the rectified panoramic images. We demonstrate that the proposed method is advantageous for both rotational and translational camera motions compared with the standard methods on the synthetic data. We also demonstrate that the proposed feature matching is beneficial for incremental SfM through the experiments on the Pittsburgh Reserach dataset.
本文提出了一种简单而有效的方法来提高等矩形圆柱图像之间的局部特征匹配性能,通过增量SfM实现更稳定、更完整的三维重建。其关键思想是通过旋转球面全景图像显式地生成合成图像,并仅从校正后的全景图像中畸变较小的区域检测和描述特征。我们在合成数据上证明,与标准方法相比,该方法在旋转和平移摄像机运动方面都具有优势。我们还通过匹兹堡研究数据集的实验证明了所提出的特征匹配有利于增量SfM。
{"title":"Robust Feature Matching for Distorted Projection by Spherical Cameras","authors":"Hajime Taira, Yuki Inoue, A. Torii, M. Okutomi","doi":"10.2197/ipsjtcva.7.84","DOIUrl":"https://doi.org/10.2197/ipsjtcva.7.84","url":null,"abstract":"In this work, we proposes a simple yet effective method for improving performance of local feature matching among equirectangular cylindrical images, which brings more stable and complete 3D reconstruction by incremental SfM. The key idea is to exiplictly generate synthesized images by rotating the spherical panoramic images and to detect and describe features only from the less distroted area in the rectified panoramic images. We demonstrate that the proposed method is advantageous for both rotational and translational camera motions compared with the standard methods on the synthetic data. We also demonstrate that the proposed feature matching is beneficial for incremental SfM through the experiments on the Pittsburgh Reserach dataset.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"1 1","pages":"84-88"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73013604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling 基于深度卷积神经网络的人脸标记成本缓解学习
Q1 Computer Science Pub Date : 2015-01-01 DOI: 10.2197/ipsjtcva.7.99
Takayoshi Yamashita, Takaya Nakamura, Hiroshi Fukui, Yuji Yamauchi, H. Fujiyoshi
Facial part labeling which is parsing semantic components enables high-level facial image analysis, and contributes greatly to face recognition, expression recognition, animation, and synthesis. In this paper, we propose a cost-alleviative learning method that uses a weighted cost function to improve the performance of certain classes during facial part labeling. As the conventional cost function handles the error in all classes equally, the error in a class with a slightly biased prior probability tends not to be propagated. The weighted cost function enables the training coefficient for each class to be adjusted. In addition, the boundaries of each class may be recognized after fewer iterations, which will improve the performance. In facial part labeling, the recognition performance of the eye class can be significantly improved using cost-alleviative learning.
面部部位标注是对语义成分进行解析的一种方法,能够对面部图像进行高层次的分析,对人脸识别、表情识别、动画合成等都有重要的贡献。在本文中,我们提出了一种成本缓解学习方法,该方法使用加权成本函数来提高面部部分标记过程中某些类的性能。由于传统的代价函数对所有类的误差处理都是平等的,所以具有轻微先验概率偏差的类的误差往往不会传播。加权代价函数使每个类别的训练系数可以调整。此外,可以在较少的迭代后识别每个类的边界,从而提高性能。在面部标记中,使用成本缓解学习可以显著提高眼类的识别性能。
{"title":"Cost-alleviative Learning for Deep Convolutional Neural Network-based Facial Part Labeling","authors":"Takayoshi Yamashita, Takaya Nakamura, Hiroshi Fukui, Yuji Yamauchi, H. Fujiyoshi","doi":"10.2197/ipsjtcva.7.99","DOIUrl":"https://doi.org/10.2197/ipsjtcva.7.99","url":null,"abstract":"Facial part labeling which is parsing semantic components enables high-level facial image analysis, and contributes greatly to face recognition, expression recognition, animation, and synthesis. In this paper, we propose a cost-alleviative learning method that uses a weighted cost function to improve the performance of certain classes during facial part labeling. As the conventional cost function handles the error in all classes equally, the error in a class with a slightly biased prior probability tends not to be propagated. The weighted cost function enables the training coefficient for each class to be adjusted. In addition, the boundaries of each class may be recognized after fewer iterations, which will improve the performance. In facial part labeling, the recognition performance of the eye class can be significantly improved using cost-alleviative learning.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"22 4 1","pages":"99-103"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78482744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Mahalanobis Encodings for Visual Categorization 视觉分类的马氏编码
Q1 Computer Science Pub Date : 2015-01-01 DOI: 10.2197/ipsjtcva.7.69
Tomoki Matsuzawa, Raissa Relator, Wataru Takei, S. Omachi, Tsuyoshi Kato
Nowadays, the design of the representation of images is one of the most crucial factors in the performance of visual categorization. A common pipeline employed in most of recent researches for obtaining an image representa- tion consists of two steps: the encoding step and the pooling step. In this paper, we introduce the Mahalanobis metric to the two popular image patch encoding modules, Histogram Encoding and Fisher Encoding, that are used for Bag- of-Visual-Word method and Fisher Vector method, respectively. Moreover, for the proposed Fisher Vector method, a close-form approximation of Fisher Vector can be derived with the same assumption used in the original Fisher Vector, and the codebook is built without resorting to time-consuming EM (Expectation-Maximization) steps. Experimental evaluation of multi-class classification demonstrates the effectiveness of the proposed encoding methods.
目前,图像表征的设计是影响视觉分类效果的关键因素之一。在最近的研究中,用于获取图像表示的常见管道包括两个步骤:编码步骤和池化步骤。本文将Mahalanobis度量引入到两种流行的图像patch编码模块中,即直方图编码和Fisher编码,这两种编码模块分别用于Bag- of- visual word法和Fisher矢量法。此外,对于所提出的Fisher向量方法,可以使用与原始Fisher向量相同的假设推导出Fisher向量的近似形式,并且无需使用耗时的EM(期望最大化)步骤构建码本。多类分类的实验评价证明了所提编码方法的有效性。
{"title":"Mahalanobis Encodings for Visual Categorization","authors":"Tomoki Matsuzawa, Raissa Relator, Wataru Takei, S. Omachi, Tsuyoshi Kato","doi":"10.2197/ipsjtcva.7.69","DOIUrl":"https://doi.org/10.2197/ipsjtcva.7.69","url":null,"abstract":"Nowadays, the design of the representation of images is one of the most crucial factors in the performance of visual categorization. A common pipeline employed in most of recent researches for obtaining an image representa- tion consists of two steps: the encoding step and the pooling step. In this paper, we introduce the Mahalanobis metric to the two popular image patch encoding modules, Histogram Encoding and Fisher Encoding, that are used for Bag- of-Visual-Word method and Fisher Vector method, respectively. Moreover, for the proposed Fisher Vector method, a close-form approximation of Fisher Vector can be derived with the same assumption used in the original Fisher Vector, and the codebook is built without resorting to time-consuming EM (Expectation-Maximization) steps. Experimental evaluation of multi-class classification demonstrates the effectiveness of the proposed encoding methods.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"7 1","pages":"69-73"},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85440502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mathematical Information Retrieval (MIR) from Scanned PDF Documents and MathML Conversion 从扫描的PDF文档和MathML转换的数学信息检索(MIR)
Q1 Computer Science Pub Date : 2014-12-10 DOI: 10.2197/ipsjtcva.6.132
A. Nazemi, I. Murray, D. McMeekin
This paper describes part of an ongoing comprehensive research project that is aimed at generating a MathML format from images of mathematical expressions that have been extracted from scanned PDF documents. A MathML representation of a scanned PDF document reduces the document’s storage size and encodes the math- ematical notation and meaning. The MathML representation then becomes suitable for vocalization and accessible through the use of assistive technologies. In order to achieve an accurate layout analysis of a scanned PDF document, all textual and non-textual components must be recognised, identified and tagged. These components may be text or mathematical expressions and graphics in the form of images, figures, tables and/or diagrams. Mathematical expres- sions are one of the most significant components within scanned scientific and engineering PDF documents and need to be machine readable for use with assistive technologies. This research is a work in progress and includes multiple different modules: detecting and extracting mathematical expressions, recursive primitive component extraction, non- alphanumerical symbols recognition, structural semantic analysis and merging primitive components to generate the MathML of the scanned PDF document. An optional module converts MathML to audio format using a Text to Speech engine (TTS) to make the document accessible for vision-impaired users. Keywords: math recognition, graphics recognition, Mathematical Informati
本文描述了一个正在进行的综合研究项目的一部分,该项目旨在从扫描的PDF文档中提取的数学表达式图像生成MathML格式。扫描PDF文档的MathML表示减少了文档的存储大小,并对数学符号和含义进行了编码。然后,MathML表示变得适合于发声,并且可以通过使用辅助技术进行访问。为了实现对扫描PDF文档的准确布局分析,必须识别、标识和标记所有文本和非文本组件。这些组件可以是文本或数学表达式,也可以是图像、数字、表格和/或图表形式的图形。数学表达式是扫描的科学和工程PDF文档中最重要的组成部分之一,需要机器可读才能与辅助技术一起使用。该研究包括数学表达式的检测与提取、递归原语成分提取、非字母数字符号识别、结构语义分析和原语成分合并生成扫描PDF文档的MathML等多个模块。一个可选模块使用文本到语音引擎(TTS)将MathML转换为音频格式,使视障用户可以访问该文档。关键词:数学识别,图形识别,数学信息
{"title":"Mathematical Information Retrieval (MIR) from Scanned PDF Documents and MathML Conversion","authors":"A. Nazemi, I. Murray, D. McMeekin","doi":"10.2197/ipsjtcva.6.132","DOIUrl":"https://doi.org/10.2197/ipsjtcva.6.132","url":null,"abstract":"This paper describes part of an ongoing comprehensive research project that is aimed at generating a \u0000MathML format from images of mathematical expressions that have been extracted from scanned PDF documents. \u0000A MathML representation of a scanned PDF document reduces the document’s storage size and encodes the math- \u0000ematical notation and meaning. The MathML representation then becomes suitable for vocalization and accessible \u0000through the use of assistive technologies. In order to achieve an accurate layout analysis of a scanned PDF document, \u0000all textual and non-textual components must be recognised, identified and tagged. These components may be text or \u0000mathematical expressions and graphics in the form of images, figures, tables and/or diagrams. Mathematical expres- \u0000sions are one of the most significant components within scanned scientific and engineering PDF documents and need \u0000to be machine readable for use with assistive technologies. This research is a work in progress and includes multiple \u0000different modules: detecting and extracting mathematical expressions, recursive primitive component extraction, non- \u0000alphanumerical symbols recognition, structural semantic analysis and merging primitive components to generate the \u0000MathML of the scanned PDF document. An optional module converts MathML to audio format using a Text to Speech \u0000engine (TTS) to make the document accessible for vision-impaired users. \u0000Keywords: math recognition, graphics recognition, Mathematical Informati","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"23 1","pages":"132-142"},"PeriodicalIF":0.0,"publicationDate":"2014-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85347979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Image Classification Using a Mixture of Subspace Models 混合子空间模型的图像分类
Q1 Computer Science Pub Date : 2014-01-01 DOI: 10.2197/ipsjtcva.6.93
Takashi Takahashi, Takio Kurita
This paper introduces a novel method for image classification using local feature descriptors. The method utilizes linear subspaces of local descriptors for characterizing their distribution and extracting image features. The extracted features are transformed into more discriminative features by the linear discriminant analysis and employed for recognizing their categories. Experimental results demonstrate that this method is competitive with the Fisher kernel method in terms of classification accuracy.
介绍了一种基于局部特征描述符的图像分类方法。该方法利用局部描述子的线性子空间来表征其分布并提取图像特征。通过线性判别分析将提取的特征转化为更具判别性的特征,并用于分类识别。实验结果表明,该方法在分类精度上优于Fisher核方法。
{"title":"Image Classification Using a Mixture of Subspace Models","authors":"Takashi Takahashi, Takio Kurita","doi":"10.2197/ipsjtcva.6.93","DOIUrl":"https://doi.org/10.2197/ipsjtcva.6.93","url":null,"abstract":"This paper introduces a novel method for image classification using local feature descriptors. The method utilizes linear subspaces of local descriptors for characterizing their distribution and extracting image features. The extracted features are transformed into more discriminative features by the linear discriminant analysis and employed for recognizing their categories. Experimental results demonstrate that this method is competitive with the Fisher kernel method in terms of classification accuracy.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"780 1","pages":"93-97"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77531533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyper-renormalization: Non-minimization Approach for Geometric Estimation 超重整化:几何估计的非最小化方法
Q1 Computer Science Pub Date : 2014-01-01 DOI: 10.2197/ipsjtcva.6.143
K. Kanatani, A. Al-Sharadqah, N. Chernov, Y. Sugaya
The technique of “renormalization” for geometric estimation attracted much attention when it appeared in early 1990s for having higher accuracy than any other then known methods. The key fact is that it directly specifies equations to solve, rather than minimizing some cost function. This paper expounds this “non-minimization approach” in detail and exploits this principle to modify renormalization so that it outperforms the standard reprojection error minimization. Doing a precise error analysis in the most general situation, we derive a formula that maximizes the accuracy of the solution; we call it hyper-renormalization. Applying it to ellipse fitting, fundamental matrix computation, and homography computation, we confirm its accuracy and efficiency for sufficiently small noise. Our emphasis is on the general principle, rather than on individual methods for particular problems.
几何估计的“重整化”技术在20世纪90年代初出现时,因其比当时已知的任何方法都具有更高的精度而受到广泛关注。关键的事实是,它直接指定要求解的方程,而不是最小化某个成本函数。本文详细阐述了这种“非最小化方法”,并利用这一原理对重整化进行修正,使其优于标准的重投影误差最小化。在最一般的情况下进行精确的误差分析,我们推导出一个公式,使解的精度最大化;我们称之为超重整化。将其应用于椭圆拟合、基本矩阵计算和单应性计算,在噪声足够小的情况下,验证了其准确性和有效性。我们的重点是一般原则,而不是个别问题的个别方法。
{"title":"Hyper-renormalization: Non-minimization Approach for Geometric Estimation","authors":"K. Kanatani, A. Al-Sharadqah, N. Chernov, Y. Sugaya","doi":"10.2197/ipsjtcva.6.143","DOIUrl":"https://doi.org/10.2197/ipsjtcva.6.143","url":null,"abstract":"The technique of “renormalization” for geometric estimation attracted much attention when it appeared in early 1990s for having higher accuracy than any other then known methods. The key fact is that it directly specifies equations to solve, rather than minimizing some cost function. This paper expounds this “non-minimization approach” in detail and exploits this principle to modify renormalization so that it outperforms the standard reprojection error minimization. Doing a precise error analysis in the most general situation, we derive a formula that maximizes the accuracy of the solution; we call it hyper-renormalization. Applying it to ellipse fitting, fundamental matrix computation, and homography computation, we confirm its accuracy and efficiency for sufficiently small noise. Our emphasis is on the general principle, rather than on individual methods for particular problems.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"14 1","pages":"143-159"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90469601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Depth from Projector's Defocus Based on Multiple Focus Pattern Projection 基于多焦点模式投影的投影仪离焦深度
Q1 Computer Science Pub Date : 2014-01-01 DOI: 10.2197/ipsjtcva.6.88
H. Masuyama, Hiroshi Kawasaki, Furukawa Ryo
For 3D active measurement methods using video projector, there is the implicit limitation that the projected patterns must be in focus on the target object. Such limitation set a severe constraints on possible range of the depth for reconstruction. In order to overcome the problem, Depth from Defocus (DfD) method using multiple patterns with different in-focus depth is proposed to expand the depth range in the paper. With the method, not only the range of the depth is extended, but also the shape can be recovered even if there is an obstacle between the projector and the target, because of the large aperture of the projector. Furthermore, thanks to the advantage of DfD which does not require baseline between the cameras and the projector, occlusion does not occur with the method. In order to verify the effectiveness of the method, several experiments using the actual system was conducted to estimate the depth of several objects.
对于使用视频投影仪的三维主动测量方法,有一个隐含的限制,即投影图案必须聚焦在目标物体上。这种限制严重限制了重建的可能深度范围。为了克服这一问题,本文提出了使用不同焦内深度的多模式离焦深度(DfD)方法来扩大深度范围。利用该方法,由于投影仪的大孔径,不仅扩大了深度范围,而且即使在投影仪和目标之间有障碍物的情况下,也可以恢复形状。此外,由于DfD的优点,它不需要相机和投影仪之间的基线,遮挡不会发生与该方法。为了验证该方法的有效性,利用实际系统进行了多个目标深度估计实验。
{"title":"Depth from Projector's Defocus Based on Multiple Focus Pattern Projection","authors":"H. Masuyama, Hiroshi Kawasaki, Furukawa Ryo","doi":"10.2197/ipsjtcva.6.88","DOIUrl":"https://doi.org/10.2197/ipsjtcva.6.88","url":null,"abstract":"For 3D active measurement methods using video projector, there is the implicit limitation that the projected patterns must be in focus on the target object. Such limitation set a severe constraints on possible range of the depth for reconstruction. In order to overcome the problem, Depth from Defocus (DfD) method using multiple patterns with different in-focus depth is proposed to expand the depth range in the paper. With the method, not only the range of the depth is extended, but also the shape can be recovered even if there is an obstacle between the projector and the target, because of the large aperture of the projector. Furthermore, thanks to the advantage of DfD which does not require baseline between the cameras and the projector, occlusion does not occur with the method. In order to verify the effectiveness of the method, several experiments using the actual system was conducted to estimate the depth of several objects.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"175 1","pages":"88-92"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74928182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Decomposing Three Fundamental Matrices for Initializing 3-D Reconstruction from Three Views 分解三维重构初始化的三个基本矩阵
Q1 Computer Science Pub Date : 2014-01-01 DOI: 10.2197/ipsjtcva.6.120
Yasushi Kanazawa, Y. Sugaya, K. Kanatani
This paper focuses on initializing 3-D reconstruction from scratch without any prior scene information. Traditionally, this has been done from two-view matching, which is prone to the degeneracy called “imaginary focal lengths.” We overcome this difficulty by using three images, but we do not require three-view matching; all we need is three fundamental matrices separately computed from pair-wise image matching. We exploit the redundancy of the three fundamental matrices to optimize the camera parameters and the 3-D structure. The main theme of this paper is to give an analytical procedure for computing the positions and orientations of the three cameras and their internal parameters from three fundamental matrices. The emphasis is on resolving the ambiguity of the solution resulting from the sign indeterminacy of the fundamental matrices. We do numerical simulation to show that imaginary focal lengths are less likely for our three view methods, resulting in higher accuracy than the conventional two-view method. We also test the degeneracy tolerance capability of our method by using endoscopic intestine tract images, for which the camera configuration is almost always nearly degenerate. We demonstrate that our method allows us to obtain more detailed intestine structures than two-view reconstruction and observe how our three-view reconstruction is refined by bundle adjustment. Our method is expected to broaden medical applications of endoscopic images.
本文的重点是在没有任何先验场景信息的情况下,从头开始初始化三维重建。传统上,这是通过双视图匹配完成的,这很容易产生称为“虚焦距”的简并。我们通过使用三幅图像来克服这个困难,但我们不需要三视图匹配;我们所需要的是三个基本矩阵,分别从成对图像匹配中计算。我们利用三个基本矩阵的冗余来优化相机参数和三维结构。本文的主题是给出一种由三个基本矩阵计算三个相机的位置和方向及其内部参数的解析方法。重点是解决由基本矩阵的符号不确定性引起的解的模糊性。数值模拟结果表明,三种视点方法不太可能出现虚焦距,比传统的双视点方法具有更高的精度。我们还通过使用内镜下的肠道图像来测试我们的方法的退化耐受能力,其中相机配置几乎总是接近退化的。我们证明,我们的方法允许我们获得更详细的肠结构比二视图重建,并观察我们的三视图重建是如何细化束调整。我们的方法有望拓宽内窥镜图像的医学应用。
{"title":"Decomposing Three Fundamental Matrices for Initializing 3-D Reconstruction from Three Views","authors":"Yasushi Kanazawa, Y. Sugaya, K. Kanatani","doi":"10.2197/ipsjtcva.6.120","DOIUrl":"https://doi.org/10.2197/ipsjtcva.6.120","url":null,"abstract":"This paper focuses on initializing 3-D reconstruction from scratch without any prior scene information. Traditionally, this has been done from two-view matching, which is prone to the degeneracy called “imaginary focal lengths.” We overcome this difficulty by using three images, but we do not require three-view matching; all we need is three fundamental matrices separately computed from pair-wise image matching. We exploit the redundancy of the three fundamental matrices to optimize the camera parameters and the 3-D structure. The main theme of this paper is to give an analytical procedure for computing the positions and orientations of the three cameras and their internal parameters from three fundamental matrices. The emphasis is on resolving the ambiguity of the solution resulting from the sign indeterminacy of the fundamental matrices. We do numerical simulation to show that imaginary focal lengths are less likely for our three view methods, resulting in higher accuracy than the conventional two-view method. We also test the degeneracy tolerance capability of our method by using endoscopic intestine tract images, for which the camera configuration is almost always nearly degenerate. We demonstrate that our method allows us to obtain more detailed intestine structures than two-view reconstruction and observe how our three-view reconstruction is refined by bundle adjustment. Our method is expected to broaden medical applications of endoscopic images.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"85 1","pages":"120-131"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75819546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Underwater 3D Surface Capture Using Multi-view Projectors and Cameras with Flat Housings 水下3D表面捕获使用多视图投影仪和相机与平面外壳
Q1 Computer Science Pub Date : 2014-01-01 DOI: 10.2197/IPSJTCVA.6.43
Ryo Kawahara, S. Nobuhara, T. Matsuyama
This paper is aimed at realizing a practical image-based 3D surface capture system of underwater objects. Image-based 3D shape acquisition of objects in water has a wide variety of academic and industrial applications because of its non-contact and non-invasive sensing properties. For example, 3D shape capture of fertilized eggs and young fish can provide a quantitative evaluation method for life-science and aquaculture. On realizing such a system, we utilize fully-calibrated multiview projectors and cameras in water (Fig. 1). Underwater projectors serve as reverse cameras while providing additional textures on poorly-textured targets. To this end, this paper focuses on the refraction caused by flat housings, while underwater photography involves other complex light events such as scattering [3,16,17], specularity [4], and transparency [13]. This is because one of the main difficulties in image-based 3D surface estimation in water is to account for refractions caused by flat housings, since flat housings cause epipolar lines to be curved and hence the local support window for texture matching to be inconstant. To cope with this issue, we can project 3D candidate points in water to 2D image planes taking the refraction into account explicitly. However, projecting a 3D point in water to a camera via a flat housing is known to be a time-consuming process which requires solving a 12th degree equation for each projection [1]. This fact indicates that 3D shape estimation in water cannot be practical as long as it is done by using the analytical projection computation. To solve this problem, we model both the projectors and cameras with flat housings based on the pixel-wise varifocal model [9]. Since this virtual camera model provides an efficient forward (3D-to-2D) projection, we can make the 3D shape estimation process feasible. The key contribution of this paper is twofold. Firstly we propose a practical method to calibrate underwater projectors with flat housings based on the pixel-wise varifocal model. Secondly we show a system for underwater 3D surface capture based on space carving principle [12] using multiple projectors and cameras in water.
本文旨在实现一种实用的基于图像的水下物体三维表面捕获系统。基于图像的水中物体三维形状采集由于其非接触和非侵入性的传感特性而具有广泛的学术和工业应用。例如,受精卵和幼鱼的三维形状捕获可以为生命科学和水产养殖提供定量评价方法。为了实现这样的系统,我们在水中使用了完全校准的多视图投影仪和相机(图1)。水下投影仪可以作为反向相机,同时在纹理较差的目标上提供额外的纹理。为此,本文主要关注平壳引起的折射,而水下摄影涉及其他复杂的光事件,如散射[3,16,17]、镜面[4]和透明度[13]。这是因为在水中基于图像的3D表面估计的主要困难之一是考虑平坦外壳引起的折射,因为平坦外壳导致极线弯曲,因此纹理匹配的局部支持窗口是不恒定的。为了解决这个问题,我们可以将水中的3D候选点投影到2D图像平面上,并明确考虑折射。然而,通过平面外壳将水中的3D点投影到相机上是一个耗时的过程,需要为每个投影解决12次方程[1]。这一事实表明,在水中的三维形状估计,只要使用解析投影计算是不现实的。为了解决这个问题,我们基于逐像素变焦模型[9]对投影仪和摄像机进行了平面外壳建模。由于该虚拟摄像机模型提供了有效的前向(3D-to- 2d)投影,我们可以使三维形状估计过程变得可行。本文的主要贡献有两个方面。首先,提出了一种实用的基于逐像素变焦模型的平壳水下投影仪标定方法。其次,我们展示了一个基于空间雕刻原理的水下三维表面捕获系统[12],该系统使用水中的多个投影仪和摄像机。
{"title":"Underwater 3D Surface Capture Using Multi-view Projectors and Cameras with Flat Housings","authors":"Ryo Kawahara, S. Nobuhara, T. Matsuyama","doi":"10.2197/IPSJTCVA.6.43","DOIUrl":"https://doi.org/10.2197/IPSJTCVA.6.43","url":null,"abstract":"This paper is aimed at realizing a practical image-based 3D surface capture system of underwater objects. Image-based 3D shape acquisition of objects in water has a wide variety of academic and industrial applications because of its non-contact and non-invasive sensing properties. For example, 3D shape capture of fertilized eggs and young fish can provide a quantitative evaluation method for life-science and aquaculture. On realizing such a system, we utilize fully-calibrated multiview projectors and cameras in water (Fig. 1). Underwater projectors serve as reverse cameras while providing additional textures on poorly-textured targets. To this end, this paper focuses on the refraction caused by flat housings, while underwater photography involves other complex light events such as scattering [3,16,17], specularity [4], and transparency [13]. This is because one of the main difficulties in image-based 3D surface estimation in water is to account for refractions caused by flat housings, since flat housings cause epipolar lines to be curved and hence the local support window for texture matching to be inconstant. To cope with this issue, we can project 3D candidate points in water to 2D image planes taking the refraction into account explicitly. However, projecting a 3D point in water to a camera via a flat housing is known to be a time-consuming process which requires solving a 12th degree equation for each projection [1]. This fact indicates that 3D shape estimation in water cannot be practical as long as it is done by using the analytical projection computation. To solve this problem, we model both the projectors and cameras with flat housings based on the pixel-wise varifocal model [9]. Since this virtual camera model provides an efficient forward (3D-to-2D) projection, we can make the 3D shape estimation process feasible. The key contribution of this paper is twofold. Firstly we propose a practical method to calibrate underwater projectors with flat housings based on the pixel-wise varifocal model. Secondly we show a system for underwater 3D surface capture based on space carving principle [12] using multiple projectors and cameras in water.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"16 1","pages":"43-47"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87138704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
ILSVRC on a Smartphone 智能手机上的ILSVRC
Q1 Computer Science Pub Date : 2014-01-01 DOI: 10.2197/ipsjtcva.6.83
Yoshiyuki Kawano, Keiji Yanai
In this work, to the best of our knowledge, we propose a stand-alone large-scale image classification system running on an Android smartphone. The objective of this work is to prove that mobile large-scale image classification requires no communication to external servers. To do that, we propose a scalar-based compression method for weight vectors of linear classifiers. As an additional characteristic, the proposed method does not need to uncompress the compressed vectors for evaluation of the classifiers, which brings the saving of recognition time. We have implemented a large-scale image classification system on an Android smartphone, which can perform 1000class classification for a given image in 0.270 seconds. In the experiment, we show that compressing the weights to 1/8 leaded to only 0.80% performance loss for 1000-class classification with the ILSVRC2012 dataset. In addition, the experimental results indicate that weight vectors compressed in low bits, even in the binarized case (bit=1), are still valid for classification of high dimensional vectors.
在这项工作中,据我们所知,我们提出了一个运行在Android智能手机上的独立大规模图像分类系统。本工作的目的是证明移动大规模图像分类不需要与外部服务器通信。为此,我们提出了一种基于标量的线性分类器权向量压缩方法。该方法的另一个特点是不需要对压缩后的向量进行解压缩来评估分类器,从而节省了识别时间。我们在Android智能手机上实现了一个大规模的图像分类系统,该系统可以在0.270秒内对给定的图像进行1000类分类。在实验中,我们发现将权重压缩到1/8只会导致ILSVRC2012数据集的1000类分类性能损失仅为0.80%。此外,实验结果表明,即使在二值化情况下(bit=1),低比特压缩的权重向量仍然适用于高维向量的分类。
{"title":"ILSVRC on a Smartphone","authors":"Yoshiyuki Kawano, Keiji Yanai","doi":"10.2197/ipsjtcva.6.83","DOIUrl":"https://doi.org/10.2197/ipsjtcva.6.83","url":null,"abstract":"In this work, to the best of our knowledge, we propose a stand-alone large-scale image classification system running on an Android smartphone. The objective of this work is to prove that mobile large-scale image classification requires no communication to external servers. To do that, we propose a scalar-based compression method for weight vectors of linear classifiers. As an additional characteristic, the proposed method does not need to uncompress the compressed vectors for evaluation of the classifiers, which brings the saving of recognition time. We have implemented a large-scale image classification system on an Android smartphone, which can perform 1000class classification for a given image in 0.270 seconds. In the experiment, we show that compressing the weights to 1/8 leaded to only 0.80% performance loss for 1000-class classification with the ILSVRC2012 dataset. In addition, the experimental results indicate that weight vectors compressed in low bits, even in the binarized case (bit=1), are still valid for classification of high dimensional vectors.","PeriodicalId":38957,"journal":{"name":"IPSJ Transactions on Computer Vision and Applications","volume":"57 1","pages":"83-87"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81897735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
IPSJ Transactions on Computer Vision and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1