首页 > 最新文献

2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)最新文献

英文 中文
On detecting abnormalities in digital mammography 数字乳房x线摄影异常检测的探讨
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759684
W. Yousef, Waleed Mustafa, Ali A. Ali, Naglaa A. Abdelrazek, Ahmed M. Farrag
Breast cancer is the most common cancer in many countries all over the world. Early detection of cancer, in either diagnosis or screening programs, decreases the mortality rates. Computer Aided Detection (CAD) is software that aids radiologists in detecting abnormalities in medical images. In this article we present our approach in detecting abnormalities in mammograms using digital mammography. Each mammogram in our dataset is manually processed — using software specially developed for that purpose — by a radiologist to mark and label different types of abnormalities. Once marked, processing henceforth is applied using computer algorithms. The majority of existing detection techniques relies on image processing (IP) to extract Regions of Interests (ROI) then extract features from those ROIs to be the input of a statistical learning machine (classifier). Detection, in this approach, is basically done at the IP phase; while the ultimate role of classifiers is to reduce the number of False Positives (FP) detected in the IP phase. In contrast, processing algorithms and classifiers, in pixel-based approach, work directly at the pixel level. We demonstrate the performance of some methods belonging to this approach and suggest an assessment metric in terms of the Mann Whitney statistic.
乳腺癌是世界上许多国家最常见的癌症。早期发现癌症,无论是诊断还是筛查,都能降低死亡率。计算机辅助检测(CAD)是一种帮助放射科医生检测医学图像异常的软件。在这篇文章中,我们提出了我们的方法在检测异常的乳房x线摄影使用数字乳房x线摄影。我们数据集中的每一张乳房x光片都是由放射科医生手工处理的——使用专门为此目的开发的软件——来标记和标记不同类型的异常。一旦标记,就使用计算机算法进行处理。现有的大多数检测技术依赖于图像处理(IP)来提取感兴趣区域(ROI),然后从这些ROI中提取特征作为统计学习机(分类器)的输入。在这种方法中,检测基本上是在IP阶段完成的;而分类器的最终作用是减少在IP阶段检测到的假阳性(FP)的数量。相反,在基于像素的方法中,处理算法和分类器直接在像素级工作。我们展示了属于这种方法的一些方法的性能,并提出了一种根据曼·惠特尼统计量的评估指标。
{"title":"On detecting abnormalities in digital mammography","authors":"W. Yousef, Waleed Mustafa, Ali A. Ali, Naglaa A. Abdelrazek, Ahmed M. Farrag","doi":"10.1109/AIPR.2010.5759684","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759684","url":null,"abstract":"Breast cancer is the most common cancer in many countries all over the world. Early detection of cancer, in either diagnosis or screening programs, decreases the mortality rates. Computer Aided Detection (CAD) is software that aids radiologists in detecting abnormalities in medical images. In this article we present our approach in detecting abnormalities in mammograms using digital mammography. Each mammogram in our dataset is manually processed — using software specially developed for that purpose — by a radiologist to mark and label different types of abnormalities. Once marked, processing henceforth is applied using computer algorithms. The majority of existing detection techniques relies on image processing (IP) to extract Regions of Interests (ROI) then extract features from those ROIs to be the input of a statistical learning machine (classifier). Detection, in this approach, is basically done at the IP phase; while the ultimate role of classifiers is to reduce the number of False Positives (FP) detected in the IP phase. In contrast, processing algorithms and classifiers, in pixel-based approach, work directly at the pixel level. We demonstrate the performance of some methods belonging to this approach and suggest an assessment metric in terms of the Mann Whitney statistic.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121511796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Laplacian based image fusion 基于拉普拉斯的图像融合
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759697
J. Scott, M. Pusateri
A fundamental goal in multispectral image fusion is to combine relevant information from multiple spectral ranges while displaying a constant amount of data as a single channel. Because we expect synergy between the views afforded by different parts of the spectrum, producing output imagery with increased information beyond any of the individual imagery sounds simple. While fusion algorithms achieve synergy under specific scenarios, it is often the case that they produce imagery with less information than any single band of imagery. Losses can arise from any number of problems including poor imagery in one band degrading the fusion result, loss of details from intrinsic smoothing, artifacts or discontinuities from discrete mixing, and distracting colors from unnatural color mapping. We have been developing and testing fusion algorithms with the goal of achieving synergy under a wider range of scenarios. This technique has been very successful in the worlds of image blending, mosaics, and image compositing for visible band imagery. The algorithm presented in this paper is based on direct pixel-wise fusion that merges the directional discrete laplacian content of individual imagery bands rather than the intensities directly. The laplacian captures the local difference in the four-connected neighborhood. The laplacian of each image is then mixed based on the premise that image edges contain the most pertinent information from each input image. This information is then reformed into an image by solving the two-dimensional Poisson equation. The preliminary results are promising and consistent. When fusing multiple continuous visible channels, the resulting image is similar to grayscale imaging over all of the visible channels. When fusing discontinuous and/or non-visible channels, the resulting image is subtly mixed and intuitive to understand.
多光谱图像融合的一个基本目标是将多个光谱范围的相关信息组合在一起,同时以单一通道的形式显示一定量的数据。因为我们期望在不同光谱部分提供的视图之间产生协同作用,所以产生具有比任何单个图像更多信息的输出图像听起来很简单。虽然融合算法在特定场景下实现协同,但通常情况下,它们产生的图像比任何单一图像带的信息都少。损失可能由许多问题引起,包括一个波段的图像质量差,降低融合结果,固有平滑造成的细节损失,离散混合造成的伪影或不连续,以及不自然的颜色映射造成的颜色分散。我们一直在开发和测试融合算法,目标是在更广泛的场景下实现协同作用。这种技术在图像混合、马赛克和可见光波段图像合成领域非常成功。本文提出的算法是基于直接逐像素融合,它融合了单个图像波段的方向离散拉普拉斯内容,而不是直接融合强度。拉普拉斯函数捕获了四连通邻域中的局部差异。然后,基于图像边缘包含来自每个输入图像的最相关信息的前提,混合每个图像的拉普拉斯算子。然后,通过求解二维泊松方程,将这些信息重组为图像。初步结果是有希望的和一致的。当融合多个连续可见通道时,得到的图像类似于所有可见通道上的灰度成像。当融合不连续和/或不可见的通道时,得到的图像是微妙的混合和直观的理解。
{"title":"Laplacian based image fusion","authors":"J. Scott, M. Pusateri","doi":"10.1109/AIPR.2010.5759697","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759697","url":null,"abstract":"A fundamental goal in multispectral image fusion is to combine relevant information from multiple spectral ranges while displaying a constant amount of data as a single channel. Because we expect synergy between the views afforded by different parts of the spectrum, producing output imagery with increased information beyond any of the individual imagery sounds simple. While fusion algorithms achieve synergy under specific scenarios, it is often the case that they produce imagery with less information than any single band of imagery. Losses can arise from any number of problems including poor imagery in one band degrading the fusion result, loss of details from intrinsic smoothing, artifacts or discontinuities from discrete mixing, and distracting colors from unnatural color mapping. We have been developing and testing fusion algorithms with the goal of achieving synergy under a wider range of scenarios. This technique has been very successful in the worlds of image blending, mosaics, and image compositing for visible band imagery. The algorithm presented in this paper is based on direct pixel-wise fusion that merges the directional discrete laplacian content of individual imagery bands rather than the intensities directly. The laplacian captures the local difference in the four-connected neighborhood. The laplacian of each image is then mixed based on the premise that image edges contain the most pertinent information from each input image. This information is then reformed into an image by solving the two-dimensional Poisson equation. The preliminary results are promising and consistent. When fusing multiple continuous visible channels, the resulting image is similar to grayscale imaging over all of the visible channels. When fusing discontinuous and/or non-visible channels, the resulting image is subtly mixed and intuitive to understand.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130218604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Modeling spatial dependencies in high-resolution overhead imagery 在高分辨率头顶图像中建模空间依赖性
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759714
A. Cheriyadat, Ranga Raju Vatsavai, E. Bright
Human settlement regions with different physical and socio-economic attributes exhibit unique spatial characteristics that are often illustrated in high-resolution overhead imageries. For example-size, shape and spatial arrangements of man-made structures are key attributes that vary with respect to the socio-economic profile of the neighborhood. Successfully modeling these attributes is crucial in developing advanced image understanding systems for interpreting complex aerial scenes. In this paper we present three different approaches to model the spatial context in the overhead imagery. First, we show that the frequency domain of the image can be used to model the spatial context [1]. The shape of the spectral energy contours characterize the scene context and can be exploited as global features. Secondly, we explore a discriminative framework based on the Conditional Random Fields (CRF) [2] to model the spatial context in the overhead imagery. The features derived from the edge orientation distribution calculated for a neighborhood and the associated class labels are used as input features to model the spatial context. Our third approach is based on grouping spatially connected pixels based on the low-level edge primitives to form support-regions [3]. The statistical parameters generated from the support-region feature distributions characterize different geospatial neighborhoods. We apply our approaches on high-resolution overhead imageries. We show that proposed approaches characterize the spatial context in overhead imageries.
具有不同自然和社会经济属性的人类住区区域表现出独特的空间特征,这些特征通常在高分辨率架空图像中得到说明。例如,人造结构的大小、形状和空间安排是根据社区的社会经济状况而变化的关键属性。成功地对这些属性进行建模对于开发用于解释复杂航拍场景的高级图像理解系统至关重要。在本文中,我们提出了三种不同的方法来模拟架空图像中的空间背景。首先,我们证明了图像的频域可以用来对空间背景进行建模[1]。光谱能量轮廓的形状表征了场景背景,可以作为全局特征加以利用。其次,我们探索了一个基于条件随机场(Conditional Random Fields, CRF)的判别框架[2],对架空图像中的空间背景进行建模。从邻域计算的边缘方向分布中得到的特征和相关的类标签被用作空间上下文建模的输入特征。我们的第三种方法是基于基于底层边缘原语的空间连接像素分组来形成支持区域[3]。从支持区域特征分布生成的统计参数表征了不同的地理空间邻域。我们将我们的方法应用于高分辨率的头顶图像。我们展示了所提出的方法表征了头顶图像中的空间背景。
{"title":"Modeling spatial dependencies in high-resolution overhead imagery","authors":"A. Cheriyadat, Ranga Raju Vatsavai, E. Bright","doi":"10.1109/AIPR.2010.5759714","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759714","url":null,"abstract":"Human settlement regions with different physical and socio-economic attributes exhibit unique spatial characteristics that are often illustrated in high-resolution overhead imageries. For example-size, shape and spatial arrangements of man-made structures are key attributes that vary with respect to the socio-economic profile of the neighborhood. Successfully modeling these attributes is crucial in developing advanced image understanding systems for interpreting complex aerial scenes. In this paper we present three different approaches to model the spatial context in the overhead imagery. First, we show that the frequency domain of the image can be used to model the spatial context [1]. The shape of the spectral energy contours characterize the scene context and can be exploited as global features. Secondly, we explore a discriminative framework based on the Conditional Random Fields (CRF) [2] to model the spatial context in the overhead imagery. The features derived from the edge orientation distribution calculated for a neighborhood and the associated class labels are used as input features to model the spatial context. Our third approach is based on grouping spatially connected pixels based on the low-level edge primitives to form support-regions [3]. The statistical parameters generated from the support-region feature distributions characterize different geospatial neighborhoods. We apply our approaches on high-resolution overhead imageries. We show that proposed approaches characterize the spatial context in overhead imageries.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131793387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Speech Emotion Recognition using a backward context 基于后向语境的语音情感识别
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759701
Erhan Guven, P. Bock
The classification of emotions, such as joy, anger, anxiety, etc. from tonal variations in human speech is an important task for research and applications in human computer interaction. In the preceding work, it has been demonstrated that the locally extracted features of speech match or surpass the performance of global features that has been adopted in current approaches. In this continuing research, a backward context, which also can be considered as a feature vector memory, is shown to improve the prediction accuracy of the Speech Emotion Recognition engine. Preliminary results on German emotional speech database illustrate significant improvements over results from the previous study.
从人类语音的音调变化中对情绪(如喜悦、愤怒、焦虑等)进行分类是人机交互研究和应用的重要任务。在之前的工作中,已经证明了局部提取的语音特征匹配或超过了当前方法中采用的全局特征的性能。在这个持续的研究中,一个向后上下文,也可以被认为是一种特征向量记忆,被证明可以提高语音情感识别引擎的预测精度。德语情绪语音数据库的初步研究结果表明,与先前的研究结果相比,有显著的改进。
{"title":"Speech Emotion Recognition using a backward context","authors":"Erhan Guven, P. Bock","doi":"10.1109/AIPR.2010.5759701","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759701","url":null,"abstract":"The classification of emotions, such as joy, anger, anxiety, etc. from tonal variations in human speech is an important task for research and applications in human computer interaction. In the preceding work, it has been demonstrated that the locally extracted features of speech match or surpass the performance of global features that has been adopted in current approaches. In this continuing research, a backward context, which also can be considered as a feature vector memory, is shown to improve the prediction accuracy of the Speech Emotion Recognition engine. Preliminary results on German emotional speech database illustrate significant improvements over results from the previous study.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123491680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Point cloud processing strategies for noise filtering, structural segmentation, and meshing of ground-based 3D Flash LIDAR images 基于地面的三维闪光激光雷达图像的噪声滤波、结构分割和网格划分的点云处理策略
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759695
D. Natale, Matthew S. Baran, R. Tutwiler
It is now the case that well-performing flash LIDAR focal plane array devices are commercially available. Such devices give us the ability to measure and record frame-registered 3D point cloud sequences at video frame rates. For many 3D computer vision applications this allows the processes of structure from motion or multi-view stereo reconstruction to be circumvented. This allows us to construct simpler, more efficient, and more robust 3D computer vision systems. This is a particular advantage for ground-based vision tasks which necessitate real-time or near real-time operation. The goal of this work is introduce several important considerations for dealing with commercial 3D Flash LIDAR data and to describe useful strategies for noise filtering, structural segmentation, and meshing of ground-based data. With marginal refinement efforts the results of this work are directly applicable to many ground-based computer vision tasks.
现在的情况是,性能良好的闪光激光雷达焦平面阵列设备已商品化。这样的设备使我们能够以视频帧速率测量和记录帧注册的3D点云序列。对于许多3D计算机视觉应用程序,这允许从运动或多视图立体重建的结构过程被规避。这使我们能够构建更简单,更高效,更强大的3D计算机视觉系统。这对于需要实时或近实时操作的地面视觉任务来说是一个特别的优势。这项工作的目标是介绍处理商业3D Flash LIDAR数据的几个重要考虑因素,并描述地面数据的噪声滤波,结构分割和网格化的有用策略。经过边际改进,本工作的结果可直接应用于许多地面计算机视觉任务。
{"title":"Point cloud processing strategies for noise filtering, structural segmentation, and meshing of ground-based 3D Flash LIDAR images","authors":"D. Natale, Matthew S. Baran, R. Tutwiler","doi":"10.1109/AIPR.2010.5759695","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759695","url":null,"abstract":"It is now the case that well-performing flash LIDAR focal plane array devices are commercially available. Such devices give us the ability to measure and record frame-registered 3D point cloud sequences at video frame rates. For many 3D computer vision applications this allows the processes of structure from motion or multi-view stereo reconstruction to be circumvented. This allows us to construct simpler, more efficient, and more robust 3D computer vision systems. This is a particular advantage for ground-based vision tasks which necessitate real-time or near real-time operation. The goal of this work is introduce several important considerations for dealing with commercial 3D Flash LIDAR data and to describe useful strategies for noise filtering, structural segmentation, and meshing of ground-based data. With marginal refinement efforts the results of this work are directly applicable to many ground-based computer vision tasks.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123715062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Adaptive selection of visual and infra-red image fusion rules 自适应选择视觉和红外图像融合规则
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759689
Guang Yang, Yafeng Yin, H. Man
The fusion of images captured from Electrical-Optical (EO) and Infra-Red (IR) cameras has been extensively studied for military applications in recent years. In this paper, we propose a novel wavelet-based framework for online fusion of EO and IR image sequences. The proposed framework provides multiple fusion rules for image fusion as well as a novel edge-based evaluation method to select the optimal fusion rule with respect to different practical scenarios. In the fusion step, EO and IR images are decomposed into different levels by 2D discrete wavelet transform. The wavelet coefficients at each level are combined by a set of fusion rules, such as min-max selection, mean-value, weighted summations, etc. Various fused images are obtained by inverse wavelet transform of combined coefficients. In the evaluation step, Sobel operator is applied on both the fused images and original images. Compared with original images, the remaining edge information in the fused each image is calculated as the fusion quality assessment. Finally, the fused image with the highest assessment value will be selected as the fusion result. In addition, the proposed method can adaptively select the best fusion rule for EO and IR images under different scenarios.
近年来,光电(EO)和红外(IR)相机捕获的图像融合在军事应用中得到了广泛的研究。在本文中,我们提出了一种新的基于小波的EO和IR图像序列在线融合框架。该框架为图像融合提供了多种融合规则,并提出了一种基于边缘的评估方法,可根据不同的实际场景选择最优的融合规则。在融合步骤中,利用二维离散小波变换将EO和IR图像分解成不同的层次。每一层的小波系数通过一套融合规则进行组合,如最小最大值选择、均值、加权求和等。对组合系数进行小波反变换,得到各种融合图像。在评价步骤中,对融合后的图像和原始图像分别应用Sobel算子。与原始图像比较,计算融合后各图像中剩余的边缘信息作为融合质量的评价。最后选取评价值最高的融合图像作为融合结果。此外,该方法还可以在不同场景下自适应地选择最佳的红外图像融合规则。
{"title":"Adaptive selection of visual and infra-red image fusion rules","authors":"Guang Yang, Yafeng Yin, H. Man","doi":"10.1109/AIPR.2010.5759689","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759689","url":null,"abstract":"The fusion of images captured from Electrical-Optical (EO) and Infra-Red (IR) cameras has been extensively studied for military applications in recent years. In this paper, we propose a novel wavelet-based framework for online fusion of EO and IR image sequences. The proposed framework provides multiple fusion rules for image fusion as well as a novel edge-based evaluation method to select the optimal fusion rule with respect to different practical scenarios. In the fusion step, EO and IR images are decomposed into different levels by 2D discrete wavelet transform. The wavelet coefficients at each level are combined by a set of fusion rules, such as min-max selection, mean-value, weighted summations, etc. Various fused images are obtained by inverse wavelet transform of combined coefficients. In the evaluation step, Sobel operator is applied on both the fused images and original images. Compared with original images, the remaining edge information in the fused each image is calculated as the fusion quality assessment. Finally, the fused image with the highest assessment value will be selected as the fusion result. In addition, the proposed method can adaptively select the best fusion rule for EO and IR images under different scenarios.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115257744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synchronizing disparate video streams from laparoscopic operations in simulation-based surgical training 在基于模拟的外科训练中同步来自腹腔镜手术的不同视频流
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759691
Zheshen Wang, Baoxin Li
In this paper, we propose a novel approach for synchronizing multiple videos captured from common laparoscopic operations in simulation-based surgical training. The disparate video sources include two hand-view sequences and one tool-view sequence that does not contain any visual overlap with the hand views. The synchronization of the video is essential for further visual analysis tasks. To the best of our knowledge, there is no prior work dealing with synchronization of completely different visual streams capturing different aspects of the same physical event. In the proposed approach, histograms of dominant motion (HoDM) are extracted and used as features for each frame. Multi-view sequence correlation (MSC), computed as accumulated products of pairwise correlations of HoDM magnitudes and co-occurrence rate of pairwise HoDM orientation patterns, is proposed for ranking possible configurations of temporal alignment. The final relative shifts for synchronizing the videos are determined by maximizing both the overlap length of all sequences and the MSC scores through a coarse-to-fine search procedure. Experiments were performed on 41 groups of videos of two laparoscopic operations, and the performance was compared to start-of-the-art method, demonstrating the effectiveness of the proposed approach.
在本文中,我们提出了一种新的方法来同步从基于模拟的外科训练中常见腹腔镜手术中捕获的多个视频。不同的视频源包括两个手视图序列和一个工具视图序列,其中不包含与手视图的任何视觉重叠。视频的同步对于进一步的视觉分析任务至关重要。据我们所知,目前还没有研究处理捕捉同一物理事件的不同方面的完全不同的视觉流的同步。在该方法中,提取优势运动直方图(HoDM)并将其用作每帧的特征。提出了多视图序列相关性(MSC),作为HoDM大小的两两相关和成对HoDM方向模式的共现率的累积乘积,用于对可能的时间排列构型进行排序。通过一个从粗到精的搜索过程,使所有序列的重叠长度和MSC分数最大化,从而确定同步视频的最终相对位移。对41组两组腹腔镜手术视频进行了实验,并与start-of- art方法进行了性能比较,验证了所提方法的有效性。
{"title":"Synchronizing disparate video streams from laparoscopic operations in simulation-based surgical training","authors":"Zheshen Wang, Baoxin Li","doi":"10.1109/AIPR.2010.5759691","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759691","url":null,"abstract":"In this paper, we propose a novel approach for synchronizing multiple videos captured from common laparoscopic operations in simulation-based surgical training. The disparate video sources include two hand-view sequences and one tool-view sequence that does not contain any visual overlap with the hand views. The synchronization of the video is essential for further visual analysis tasks. To the best of our knowledge, there is no prior work dealing with synchronization of completely different visual streams capturing different aspects of the same physical event. In the proposed approach, histograms of dominant motion (HoDM) are extracted and used as features for each frame. Multi-view sequence correlation (MSC), computed as accumulated products of pairwise correlations of HoDM magnitudes and co-occurrence rate of pairwise HoDM orientation patterns, is proposed for ranking possible configurations of temporal alignment. The final relative shifts for synchronizing the videos are determined by maximizing both the overlap length of all sequences and the MSC scores through a coarse-to-fine search procedure. Experiments were performed on 41 groups of videos of two laparoscopic operations, and the performance was compared to start-of-the-art method, demonstrating the effectiveness of the proposed approach.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131469948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Advanced image processing techniques for extracting regions of interest using multimode IR processing 使用多模式红外处理提取感兴趣区域的先进图像处理技术
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759717
J. Caulfield, J. Havlicek
As large format single color and multicolor sensors proliferate challenges in viewing and taking action on a much higher data volume becomes challenging to end users. We report on processing techniques to effectively extract targets utilizing multiple processing modes — both multiple spectral band and multiple pre-processed data bands from Focal Plane Array (FPA) sensors. We have developed image processing techniques which address many of the key pre-processing requirements, including scene based non-uniformity correction of static and dynamic pixels, multiband processing for object detection, and reduction and management of clutter and non-targets in a cluttered environment. Key motivations for these techniques include image pre-processing extracting small percentages of the image set with potentially high likelihood targets and then transmitting “active” pixel data while ignoring unchanging pixels. These techniques have demonstrated significant reductions in the raw data, and allow the end user to more intelligently select potential data types for object identification without requiring a person in the loop.
随着大幅面单色和多色传感器的激增,对最终用户来说,在更大的数据量上查看和采取行动的挑战变得更具挑战性。我们报告了利用多处理模式有效提取目标的处理技术-焦平面阵列(FPA)传感器的多光谱带和多预处理数据带。我们已经开发了图像处理技术,解决了许多关键的预处理要求,包括静态和动态像素的基于场景的非均匀性校正,目标检测的多波段处理,以及在混乱的环境中减少和管理杂波和非目标。这些技术的主要动机包括图像预处理,提取具有潜在高可能性目标的图像集的小百分比,然后传输“活动”像素数据,同时忽略不变的像素。这些技术已经证明了原始数据的显著减少,并且允许最终用户更智能地为对象识别选择潜在的数据类型,而不需要人员参与。
{"title":"Advanced image processing techniques for extracting regions of interest using multimode IR processing","authors":"J. Caulfield, J. Havlicek","doi":"10.1109/AIPR.2010.5759717","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759717","url":null,"abstract":"As large format single color and multicolor sensors proliferate challenges in viewing and taking action on a much higher data volume becomes challenging to end users. We report on processing techniques to effectively extract targets utilizing multiple processing modes — both multiple spectral band and multiple pre-processed data bands from Focal Plane Array (FPA) sensors. We have developed image processing techniques which address many of the key pre-processing requirements, including scene based non-uniformity correction of static and dynamic pixels, multiband processing for object detection, and reduction and management of clutter and non-targets in a cluttered environment. Key motivations for these techniques include image pre-processing extracting small percentages of the image set with potentially high likelihood targets and then transmitting “active” pixel data while ignoring unchanging pixels. These techniques have demonstrated significant reductions in the raw data, and allow the end user to more intelligently select potential data types for object identification without requiring a person in the loop.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128707536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vision physiology applied to hyperspectral short wave infrared imaging 视觉生理学在高光谱短波红外成像中的应用
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759698
P. Willson, Gabriel Chan, Paul Yun
The hyperspectral space consisting of narrow spectral bands is neither an optimal nor an orthogonal feature space when identifying objects. In this paper we consider a means of reducing hyperspectral feature space to a multispectral feature space that is orthogonal and optimal for separation of the objects from background. The motivation for this work is derived from the fact that the retina of the human eye uses only four broad and overlapping spectral response functions and yet it is optimal for detecting objects of multifarious colors in the visible region. In this paper we explore using spectral response functions for the Short Wave Infrared (SWIR) region that are not sharp, but broad and overlapping and even more complex than those found in the retina for the visible region. Treating the measured intensities of the narrow spectral bands as feature vectors of the object of interest, we calculate a new vector space which effectively is a weighted average of the old space, but is optimal for separating the object from the background.
由窄谱带组成的高光谱空间在识别目标时既不是最优特征空间,也不是正交特征空间。在本文中,我们考虑了一种将高光谱特征空间简化为多光谱特征空间的方法,该多光谱特征空间是正交的,并且最适合于目标与背景的分离。这项工作的动机源于这样一个事实,即人眼的视网膜仅使用四种广泛且重叠的光谱响应函数,但它对于检测可见区域内各种颜色的物体是最佳的。在本文中,我们探索使用光谱响应函数在短波红外(SWIR)区域,不是尖锐的,但宽和重叠,甚至比在视网膜中发现的可见光区域更复杂。将窄谱带的测量强度作为感兴趣目标的特征向量,计算出一个新的向量空间,该空间是旧空间的加权平均,但最适合将目标与背景分离。
{"title":"Vision physiology applied to hyperspectral short wave infrared imaging","authors":"P. Willson, Gabriel Chan, Paul Yun","doi":"10.1109/AIPR.2010.5759698","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759698","url":null,"abstract":"The hyperspectral space consisting of narrow spectral bands is neither an optimal nor an orthogonal feature space when identifying objects. In this paper we consider a means of reducing hyperspectral feature space to a multispectral feature space that is orthogonal and optimal for separation of the objects from background. The motivation for this work is derived from the fact that the retina of the human eye uses only four broad and overlapping spectral response functions and yet it is optimal for detecting objects of multifarious colors in the visible region. In this paper we explore using spectral response functions for the Short Wave Infrared (SWIR) region that are not sharp, but broad and overlapping and even more complex than those found in the retina for the visible region. Treating the measured intensities of the narrow spectral bands as feature vectors of the object of interest, we calculate a new vector space which effectively is a weighted average of the old space, but is optimal for separating the object from the background.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133533520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Tactical geospatial intelligence from full motion video 全动态视频的战术地理空间情报
Pub Date : 2010-10-01 DOI: 10.1109/AIPR.2010.5759699
R. Madison, Yuetian Xu
The current proliferation of Unmanned Aircraft Systems provides an increasing amount of full-motion video (FMV) that, among other things, encodes geospatial intelligence. But the FMV is rarely converted into useful products, thus the intel potential is wasted. We have developed four concept demonstrations of methods to convert FMV into more immediately useful products, including: more accurate coordinates for objects of interest; timely, geo-registered, orthorectified imagery; conversion of mouse-clicks to object coordinates; and first-person-perspective visualization of graphical control measures. We believe these concepts can convey valuable geospatial intelligence to the tactical user.
当前无人机系统的激增提供了越来越多的全动态视频(FMV),其中包括对地理空间情报进行编码。但FMV很少转化为有用的产品,因此英特尔的潜力被浪费了。我们已经开发了四种概念演示方法,将FMV转换为更直接有用的产品,包括:更精确的感兴趣对象坐标;及时的、地理注册的、正校正的图像;鼠标点击到对象坐标的转换;以及图形化控制措施的第一人称视角可视化。我们相信这些概念可以向战术用户传达有价值的地理空间情报。
{"title":"Tactical geospatial intelligence from full motion video","authors":"R. Madison, Yuetian Xu","doi":"10.1109/AIPR.2010.5759699","DOIUrl":"https://doi.org/10.1109/AIPR.2010.5759699","url":null,"abstract":"The current proliferation of Unmanned Aircraft Systems provides an increasing amount of full-motion video (FMV) that, among other things, encodes geospatial intelligence. But the FMV is rarely converted into useful products, thus the intel potential is wasted. We have developed four concept demonstrations of methods to convert FMV into more immediately useful products, including: more accurate coordinates for objects of interest; timely, geo-registered, orthorectified imagery; conversion of mouse-clicks to object coordinates; and first-person-perspective visualization of graphical control measures. We believe these concepts can convey valuable geospatial intelligence to the tactical user.","PeriodicalId":128378,"journal":{"name":"2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115892413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1