Pattern Analysis and Applications最新文献_第10页

Subspace clustering via adaptive-loss regularized representation learning with latent affinities 通过具有潜在亲和力的自适应损失正则化表示学习进行子空间聚类

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01226-7

Kun Jiang, Lei Zhu, Zheng Liu, Qindong Sun

High-dimensional data that lies on several subspaces tend to be highly correlated and contaminated by various noises, and its affinities across different subspaces are not always reliable, which impedes the effectiveness of subspace clustering. To alleviate the deficiencies, we propose a novel subspace learning model via adaptive-loss regularized representation learning with latent affinities (ALRLA). Specifically, the robust least square regression with nonnegative constraint is firstly proposed to generate more interpretable reconstruction coefficients in low-dimensional subspace and specify the weighted self-representation capability with adaptive loss norm for better robustness and discrimination. Moreover, an adaptive latent graph learning regularizer with an initialized affinity approximation is considered to provide more accurate and robust neighborhood assignment for low-dimensional representations. Finally, the objective model is solved by an alternating optimization algorithm, with theoretical analyses on its convergence and computational complexity. Extensive experiments on benchmark databases demonstrate that the ALRLA model can produce clearer structured representation under redundant and noisy data environment. It achieves competing clustering performance compared with the state-of-the-art clustering models.

位于多个子空间上的高维数据往往具有高度相关性并受到各种噪声的污染，其在不同子空间上的亲和性并不总是可靠的，这阻碍了子空间聚类的有效性。为了弥补这些不足，我们提出了一种新的子空间学习模型，即具有潜在亲和力的自适应损失正则化表示学习（ALRLA）。具体来说，我们首先提出了具有非负约束的稳健最小二乘回归，以在低维子空间中生成更多可解释的重构系数，并指定具有自适应损失规范的加权自表示能力，以获得更好的稳健性和区分度。此外，还考虑了具有初始化亲和近似的自适应潜图学习正则器，为低维表征提供更准确、更稳健的邻域分配。最后，目标模型通过交替优化算法求解，并对其收敛性和计算复杂性进行了理论分析。在基准数据库上进行的大量实验表明，ALRLA 模型能在冗余和高噪声数据环境下生成更清晰的结构化表示。与最先进的聚类模型相比，它的聚类性能更胜一筹。

{"title":"Subspace clustering via adaptive-loss regularized representation learning with latent affinities","authors":"Kun Jiang, Lei Zhu, Zheng Liu, Qindong Sun","doi":"10.1007/s10044-024-01226-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01226-7","url":null,"abstract":"High-dimensional data that lies on several subspaces tend to be highly correlated and contaminated by various noises, and its affinities across different subspaces are not always reliable, which impedes the effectiveness of subspace clustering. To alleviate the deficiencies, we propose a novel subspace learning model via adaptive-loss regularized representation learning with latent affinities (ALRLA). Specifically, the robust least square regression with nonnegative constraint is firstly proposed to generate more interpretable reconstruction coefficients in low-dimensional subspace and specify the weighted self-representation capability with adaptive loss norm for better robustness and discrimination. Moreover, an adaptive latent graph learning regularizer with an initialized affinity approximation is considered to provide more accurate and robust neighborhood assignment for low-dimensional representations. Finally, the objective model is solved by an alternating optimization algorithm, with theoretical analyses on its convergence and computational complexity. Extensive experiments on benchmark databases demonstrate that the ALRLA model can produce clearer structured representation under redundant and noisy data environment. It achieves competing clustering performance compared with the state-of-the-art clustering models.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"170 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive spectral graph wavelets for collaborative filtering 用于协同过滤的自适应谱图小波

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01214-x

Osama Alshareet, A. Ben Hamza

Collaborative filtering is a popular approach in recommender systems, whose objective is to provide personalized item suggestions to potential users based on their purchase or browsing history. However, personalized recommendations require considerable amount of behavioral data on users, which is usually unavailable for new users, giving rise to the cold-start problem. To help alleviate this challenging problem, we introduce a spectral graph wavelet collaborative filtering framework for implicit feedback data, where users, items and their interactions are represented as a bipartite graph. Specifically, we first propose an adaptive transfer function by leveraging a power transform with the goal of stabilizing the variance of graph frequencies in the spectral domain. Then, we design a deep recommendation model for efficient learning of low-dimensional embeddings of users and items using spectral graph wavelets in an end-to-end fashion. In addition to capturing the graph’s local and global structures, our approach yields localization of graph signals in both spatial and spectral domains and hence not only learns discriminative representations of users and items, but also promotes the recommendation quality. The effectiveness of our proposed model is demonstrated through extensive experiments on real-world benchmark datasets, achieving better recommendation performance compared with strong baseline methods.

协作过滤是推荐系统中一种流行的方法，其目的是根据潜在用户的购买或浏览历史，向他们提供个性化的商品推荐。然而，个性化推荐需要大量的用户行为数据，而新用户通常无法获得这些数据，这就产生了冷启动问题。为了帮助缓解这个具有挑战性的问题，我们为隐式反馈数据引入了一个谱图小波协同过滤框架，其中用户、项目及其交互被表示为一个双方图。具体来说，我们首先利用功率变换提出了一种自适应传递函数，目的是稳定频谱域中图频率的方差。然后，我们设计了一个深度推荐模型，利用谱图小波以端到端的方式高效学习用户和项目的低维嵌入。除了捕捉图的局部和全局结构外，我们的方法还能在空间域和频谱域对图信号进行定位，因此不仅能学习用户和项目的鉴别表征，还能提高推荐质量。我们在真实世界基准数据集上进行了大量实验，证明了我们提出的模型的有效性，与强大的基线方法相比，我们的模型获得了更好的推荐性能。

{"title":"Adaptive spectral graph wavelets for collaborative filtering","authors":"Osama Alshareet, A. Ben Hamza","doi":"10.1007/s10044-024-01214-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01214-x","url":null,"abstract":"Collaborative filtering is a popular approach in recommender systems, whose objective is to provide personalized item suggestions to potential users based on their purchase or browsing history. However, personalized recommendations require considerable amount of behavioral data on users, which is usually unavailable for new users, giving rise to the cold-start problem. To help alleviate this challenging problem, we introduce a spectral graph wavelet collaborative filtering framework for implicit feedback data, where users, items and their interactions are represented as a bipartite graph. Specifically, we first propose an adaptive transfer function by leveraging a power transform with the goal of stabilizing the variance of graph frequencies in the spectral domain. Then, we design a deep recommendation model for efficient learning of low-dimensional embeddings of users and items using spectral graph wavelets in an end-to-end fashion. In addition to capturing the graph’s local and global structures, our approach yields localization of graph signals in both spatial and spectral domains and hence not only learns discriminative representations of users and items, but also promotes the recommendation quality. The effectiveness of our proposed model is demonstrated through extensive experiments on real-world benchmark datasets, achieving better recommendation performance compared with strong baseline methods.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"2673 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Redirected transfer learning for robust multi-layer subspace learning 鲁棒多层子空间学习的重定向转移学习

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01233-8

Abstract

Unsupervised transfer learning methods usually exploit the labeled source data to learn a classifier for unlabeled target data with a different but related distribution. However, most of the existing transfer learning methods leverage 0-1 matrix as labels which greatly narrows the flexibility of transfer learning. Another major limitation is that these methods are influenced by the redundant features and noises residing in cross-domain data. To cope with these two issues simultaneously, this paper proposes a redirected transfer learning (RTL) approach for unsupervised transfer learning with a multi-layer subspace learning structure. Specifically, in the first layer, we first learn a robust subspace where data from different domains can be well interlaced. This is made by reconstructing each target sample with the lowest-rank representation of source samples. Besides, imposing the (L_{2,1}) -norm sparsity on the regression term and regularization term brings robustness against noise and works for selecting informative features, respectively. In the second layer, we further introduce a redirected label strategy in which the strict binary labels are relaxed into continuous values for each datum. To handle effectively unknown labels of the target domain, we construct the pseudo-labels iteratively for unlabeled target samples to improve the discriminative ability in classification. The superiority of our method in classification tasks is confirmed on several cross-domain datasets.

摘要无监督迁移学习方法通常利用有标签的源数据来学习分类器，以处理分布不同但相关的无标签目标数据。然而，现有的迁移学习方法大多利用 0-1 矩阵作为标签，这大大降低了迁移学习的灵活性。另一个主要局限是，这些方法会受到跨领域数据中冗余特征和噪声的影响。为了同时解决这两个问题，本文提出了一种采用多层子空间学习结构的无监督迁移学习重定向迁移学习（RTL）方法。具体来说，在第一层，我们首先学习一个稳健的子空间，在这个子空间中，来自不同领域的数据可以很好地交错在一起。这是通过用源样本的最低秩表示重构每个目标样本来实现的。此外，对回归项和正则化项施加 (L_{2,1}) -norm稀疏性，可分别带来对抗噪声的鲁棒性和选择信息特征的作用。在第二层，我们进一步引入了重定向标签策略，将严格的二进制标签放宽为每个数据的连续值。为了有效处理目标域的未知标签，我们对未标记的目标样本反复构建伪标签，以提高分类的判别能力。我们的方法在分类任务中的优越性在几个跨领域数据集上得到了证实。

{"title":"Redirected transfer learning for robust multi-layer subspace learning","authors":"","doi":"10.1007/s10044-024-01233-8","DOIUrl":"https://doi.org/10.1007/s10044-024-01233-8","url":null,"abstract":"<h3>Abstract</h3> Unsupervised transfer learning methods usually exploit the labeled source data to learn a classifier for unlabeled target data with a different but related distribution. However, most of the existing transfer learning methods leverage 0-1 matrix as labels which greatly narrows the flexibility of transfer learning. Another major limitation is that these methods are influenced by the redundant features and noises residing in cross-domain data. To cope with these two issues simultaneously, this paper proposes a redirected transfer learning (RTL) approach for unsupervised transfer learning with a multi-layer subspace learning structure. Specifically, in the first layer, we first learn a robust subspace where data from different domains can be well interlaced. This is made by reconstructing each target sample with the lowest-rank representation of source samples. Besides, imposing the (L_{2,1}) -norm sparsity on the regression term and regularization term brings robustness against noise and works for selecting informative features, respectively. In the second layer, we further introduce a redirected label strategy in which the strict binary labels are relaxed into continuous values for each datum. To handle effectively unknown labels of the target domain, we construct the pseudo-labels iteratively for unlabeled target samples to improve the discriminative ability in classification. The superiority of our method in classification tasks is confirmed on several cross-domain datasets.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D orientation field transform 三维方位场变换

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01212-z

Wai-Tsun Yeung, Xiaohao Cai, Zizhen Liang, Byung-Ho Kang

Vascular structure enhancement is very useful in image processing and computer vision. The enhancement of the presence of the structures like tubular networks in given images can improve image-dependent diagnostics and can also facilitate tasks like segmentation. The two-dimensional (2D) orientation field transform has been proved to be effective at enhancing 2D contours and curves in images by means of top-down processing. It, however, has no counterpart in 3D images due to the extremely complicated orientation in 3D against 2D. Given the rising demand and interest in handling 3D images, we experiment with modularising the concept and generalise the algorithm to 3D curves. In this work, we propose a 3D orientation field transform. It is a vascular structure enhancement algorithm that can cleanly enhance images having very low signal-to-noise ratio, and push the limits of 3D image quality that can be enhanced computationally. This work also utilises the benefits of modularity and offers several combinative options that each yield moderately better enhancement results in different scenarios. In principle, the proposed 3D orientation field transform can naturally tackle any number of dimensions. As a special case, it is also ideal for 2D images, owning a simpler methodology compared to the previous 2D orientation field transform. The concise structure of the proposed 3D orientation field transform also allows it to be mixed with other enhancement algorithms, and as a preliminary filter to other tasks like segmentation and detection. The effectiveness of the proposed method is demonstrated with synthetic 3D images and real-world transmission electron microscopy tomograms ranging from 2D curve enhancement to, the more important and interesting, 3D ones. Extensive experiments and comparisons with existing related methods also demonstrate the excellent performance of the proposed 3D orientation field transform.

血管结构增强在图像处理和计算机视觉中非常有用。增强给定图像中管状网络等结构的存在，可以改善依赖图像的诊断，还能促进分割等任务。事实证明，二维（2D）方位场变换可以通过自上而下的处理方法有效增强图像中的二维轮廓和曲线。然而，由于三维图像的方位与二维图像相比极其复杂，因此在三维图像中没有对应的方法。鉴于处理三维图像的需求和兴趣日益增长，我们尝试将这一概念模块化，并将算法推广到三维曲线。在这项工作中，我们提出了三维方位场变换。它是一种血管结构增强算法，可以干净利落地增强信噪比极低的图像，并突破了可计算增强的三维图像质量的极限。这项工作还利用了模块化的优势，提供了几种组合选项，每种选项在不同情况下都能产生适度更好的增强效果。原则上，所提出的三维方位场变换可以自然地处理任意数量的维度。作为一个特例，它也非常适合二维图像，与之前的二维方位场变换相比，它拥有更简单的方法。建议的三维方位场变换结构简洁，可与其他增强算法混合使用，也可作为其他任务（如分割和检测）的初步滤波器。从二维曲线增强到更重要、更有趣的三维曲线增强，合成三维图像和真实世界的透射电子显微断层图像都证明了所提方法的有效性。大量实验以及与现有相关方法的比较也证明了所提出的三维方位场变换的卓越性能。

{"title":"3D orientation field transform","authors":"Wai-Tsun Yeung, Xiaohao Cai, Zizhen Liang, Byung-Ho Kang","doi":"10.1007/s10044-024-01212-z","DOIUrl":"https://doi.org/10.1007/s10044-024-01212-z","url":null,"abstract":"Vascular structure enhancement is very useful in image processing and computer vision. The enhancement of the presence of the structures like tubular networks in given images can improve image-dependent diagnostics and can also facilitate tasks like segmentation. The two-dimensional (2D) orientation field transform has been proved to be effective at enhancing 2D contours and curves in images by means of top-down processing. It, however, has no counterpart in 3D images due to the extremely complicated orientation in 3D against 2D. Given the rising demand and interest in handling 3D images, we experiment with modularising the concept and generalise the algorithm to 3D curves. In this work, we propose a 3D orientation field transform. It is a vascular structure enhancement algorithm that can cleanly enhance images having very low signal-to-noise ratio, and push the limits of 3D image quality that can be enhanced computationally. This work also utilises the benefits of modularity and offers several combinative options that each yield moderately better enhancement results in different scenarios. In principle, the proposed 3D orientation field transform can naturally tackle any number of dimensions. As a special case, it is also ideal for 2D images, owning a simpler methodology compared to the previous 2D orientation field transform. The concise structure of the proposed 3D orientation field transform also allows it to be mixed with other enhancement algorithms, and as a preliminary filter to other tasks like segmentation and detection. The effectiveness of the proposed method is demonstrated with synthetic 3D images and real-world transmission electron microscopy tomograms ranging from 2D curve enhancement to, the more important and interesting, 3D ones. Extensive experiments and comparisons with existing related methods also demonstrate the excellent performance of the proposed 3D orientation field transform.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning discriminative local contexts for person re-identification in vehicle surveillance scenarios 为车辆监控场景中的人员再识别学习辨别性局部语境

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01219-6

Abstract

In recent years, person re-identification (Re-ID) has been widely used in intelligent surveillance and security. However, Re-ID faces many challenges in the vehicle surveillance scenario, such as heavy occlusion, misalignment, and similar appearances. Most Re-ID methods focus on learning discriminative global features or dividing regions for local feature learning, which may ignore critical but subtle differences between pedestrians. In this paper, we propose a local context aggregation branch for learning discriminative local contexts at multiple scales, which can supplement the critical detailed information omitted in global features. Specifically, we exploit dilated convolutions to simulate spatial feature pyramid to capture multi-scale spatial contexts efficiently. The essential information that can distinguish different pedestrians is then emphasized. Besides, we construct a Re-ID dataset named BSV for vehicle surveillance scenarios and propose a triplet loss with station constraint enhancement, which utilizes additional valuable station information to construct penalty terms to improve the performance of Re-ID further. Extensive experiments are conducted on the proposed BSV dataset and two standard Re-ID datasets, and the results validate the effectiveness of our method.

摘要近年来，人员再识别（Re-ID）被广泛应用于智能监控和安防领域。然而，在车辆监控场景中，重新识别面临着许多挑战，如严重遮挡、错位和相似外观等。大多数 Re-ID 方法都侧重于学习具有区分性的全局特征或划分区域进行局部特征学习，这可能会忽略行人之间关键但细微的差异。在本文中，我们提出了一个局部上下文聚合分支，用于学习多尺度的辨别性局部上下文，从而补充全局特征中遗漏的关键细节信息。具体来说，我们利用扩张卷积来模拟空间特征金字塔，从而有效捕捉多尺度空间上下文。这样就能突出区分不同行人的基本信息。此外，我们还构建了一个名为 BSV 的 Re-ID 数据集，用于车辆监控场景，并提出了一种带有站点约束增强的三重损失法，利用额外的有价值站点信息来构建惩罚项，从而进一步提高 Re-ID 的性能。我们在提出的 BSV 数据集和两个标准 Re-ID 数据集上进行了广泛的实验，结果验证了我们方法的有效性。

{"title":"Learning discriminative local contexts for person re-identification in vehicle surveillance scenarios","authors":"","doi":"10.1007/s10044-024-01219-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01219-6","url":null,"abstract":"<h3>Abstract</h3> In recent years, person re-identification (Re-ID) has been widely used in intelligent surveillance and security. However, Re-ID faces many challenges in the vehicle surveillance scenario, such as heavy occlusion, misalignment, and similar appearances. Most Re-ID methods focus on learning discriminative global features or dividing regions for local feature learning, which may ignore critical but subtle differences between pedestrians. In this paper, we propose a local context aggregation branch for learning discriminative local contexts at multiple scales, which can supplement the critical detailed information omitted in global features. Specifically, we exploit dilated convolutions to simulate spatial feature pyramid to capture multi-scale spatial contexts efficiently. The essential information that can distinguish different pedestrians is then emphasized. Besides, we construct a Re-ID dataset named BSV for vehicle surveillance scenarios and propose a triplet loss with station constraint enhancement, which utilizes additional valuable station information to construct penalty terms to improve the performance of Re-ID further. Extensive experiments are conducted on the proposed BSV dataset and two standard Re-ID datasets, and the results validate the effectiveness of our method.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Segmenting large historical notarial manuscripts into multi-page deeds 将大型历史公证手稿分割为多页契约

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01235-6

Jose Ramón Prieto, David Becerra, Alejandro Hector Toselli, Carlos Alonso, Enrique Vidal

Archives around the world hold vast digitized series of historical manuscript books or “bundles” containing, among others, notarial records also known as “deeds” or “acts”. One of the first steps to provide metadata which describe the contents of those bundles is to segment them into their individual deeds. Even if deeds are often page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps: first, we estimate the class-posterior at the page level for the “initial”, “middle”, and “final” classes; then we “decode” these posteriors applying a series of sequentiality consistency constraints to obtain a consistent book segmentation. Experiments are presented for four large historical manuscripts, varying the number of “deeds” used for training. Two metrics are introduced to assess the quality of book segmentation, one of them taking into account the loss of information entailed by segmentation errors. The problem formalization, the metrics and the empirical work significantly extend our previous works on this topic.

世界各地的档案馆拥有大量数字化的历史手稿书籍或 "包"，其中包括公证记录，也称为 "契约 "或 "行为"。要提供描述这些书卷内容的元数据，首要步骤之一就是将它们分割成单个契约。即使契约通常都是页面对齐的，就像本作品中考虑的包一样，这也是一项耗时的任务，而且由于所涉及的手稿系列规模巨大，这项任务往往令人望而却步。与用于页面级分割的传统布局分析方法不同，我们的方法超越了单页图像的范畴，可为整卷手稿提供一致的契据检测结果。这是通过两个紧密结合的步骤实现的：首先，我们在页面级别估计 "初始"、"中间 "和 "最终 "类别的类别后设；然后，我们应用一系列顺序一致性约束对这些后设进行 "解码"，以获得一致的书籍分割。我们对四部大型历史手稿进行了实验，并改变了用于训练的 "契约 "数量。实验引入了两个指标来评估书籍分割的质量，其中一个指标考虑到了分割错误带来的信息损失。问题的形式化、度量和实证工作大大扩展了我们之前在这一主题上的工作。

{"title":"Segmenting large historical notarial manuscripts into multi-page deeds","authors":"Jose Ramón Prieto, David Becerra, Alejandro Hector Toselli, Carlos Alonso, Enrique Vidal","doi":"10.1007/s10044-024-01235-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01235-6","url":null,"abstract":"Archives around the world hold vast digitized series of historical manuscript books or “bundles” containing, among others, notarial records also known as “deeds” or “acts”. One of the first steps to provide metadata which describe the contents of those bundles is to segment them into their individual deeds. Even if deeds are often page-aligned, as in the bundles considered in the present work, this is a time-consuming task, often prohibitive given the huge scale of the manuscript series involved. Unlike traditional Layout Analysis methods for page-level segmentation, our approach goes beyond the realm of a single-page image, providing consistent deed detection results on full bundles. This is achieved in two tightly integrated steps: first, we estimate the class-posterior at the page level for the “initial”, “middle”, and “final” classes; then we “decode” these posteriors applying a series of sequentiality consistency constraints to obtain a consistent book segmentation. Experiments are presented for four large historical manuscripts, varying the number of “deeds” used for training. Two metrics are introduced to assess the quality of book segmentation, one of them taking into account the loss of information entailed by segmentation errors. The problem formalization, the metrics and the empirical work significantly extend our previous works on this topic.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"253 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A spatio-temporal binary grid-based clustering model for seismicity analysis 基于时空二元网格的地震分析聚类模型

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01234-7

Rahul Kumar Vijay, Satyasai Jagannath Nanda, Ashish Sharma

This paper presents a spatio-temporal binary grid-based clustering model for determining complex earthquake clusters with different shapes and heterogeneous densities present in a catalog. The 3D occurrence of earthquakes is mapped into a 2D-low memory sparse matrix through a grid mechanism in the binary domain with consideration of spatio-temporal attributes. Then, image-transformation of a non-empty sets binary feature matrix, a clustering strategy is implemented with logical AND operator as similarity measure among the binary vectors. This approach is applied to solve the problem of seismicity declustering which separates the clustering and non-clustering patterns of seismicity for real-world earthquake catalogs of Japan (1972–2020) and Eastern Mediterranean (1966–2020). Results demonstrate that the proposed method has a significant reduction in both computation and memory footprint with few tuning parameters. Background earthquakes have an impression on the homogeneous Poisson process with fair memory-less characteristics in the time domain as evident from graphical and statistical analysis. Overall seismicity and observed background activity both have similar multi-fractal behavior with a deviation of (pm 0.04). The comparative analysis is carried out with benchmark declustering models: Gardner–Knopoff, Uhrhammer, Gruenthal window-based method, and Reasenberg’s approach, and superior performance of the proposed method is found in most cases.

本文提出了一种基于二进制网格的时空聚类模型，用于确定目录中具有不同形状和异质密度的复杂地震群。考虑到时空属性，通过二进制域中的网格机制将三维地震发生情况映射为二维低内存稀疏矩阵。然后，对非空集二进制特征矩阵进行图像转换，并使用逻辑 AND 运算符作为二进制向量之间的相似性度量，实施聚类策略。这种方法被应用于解决地震解聚问题，即分离日本（1972-2020 年）和地中海东部（1966-2020 年）实际地震目录中的地震聚类和非聚类模式。结果表明，所提出的方法只需很少的调整参数，就能显著减少计算量和内存占用。从图形和统计分析中可以看出，背景地震对同质泊松过程有印象，在时域中具有公平的无记忆特征。总体地震活动性和观测到的背景活动性都具有相似的多分形行为，偏差为（pm 0.04）。与基准解聚模型进行了比较分析：发现所提出的方法在大多数情况下性能更优。

{"title":"A spatio-temporal binary grid-based clustering model for seismicity analysis","authors":"Rahul Kumar Vijay, Satyasai Jagannath Nanda, Ashish Sharma","doi":"10.1007/s10044-024-01234-7","DOIUrl":"https://doi.org/10.1007/s10044-024-01234-7","url":null,"abstract":"This paper presents a spatio-temporal binary grid-based clustering model for determining complex earthquake clusters with different shapes and heterogeneous densities present in a catalog. The 3D occurrence of earthquakes is mapped into a 2D-low memory sparse matrix through a grid mechanism in the binary domain with consideration of spatio-temporal attributes. Then, image-transformation of a non-empty sets binary feature matrix, a clustering strategy is implemented with logical AND operator as similarity measure among the binary vectors. This approach is applied to solve the problem of seismicity declustering which separates the clustering and non-clustering patterns of seismicity for real-world earthquake catalogs of Japan (1972–2020) and Eastern Mediterranean (1966–2020). Results demonstrate that the proposed method has a significant reduction in both computation and memory footprint with few tuning parameters. Background earthquakes have an impression on the homogeneous Poisson process with fair memory-less characteristics in the time domain as evident from graphical and statistical analysis. Overall seismicity and observed background activity both have similar multi-fractal behavior with a deviation of (pm 0.04). The comparative analysis is carried out with benchmark declustering models: Gardner–Knopoff, Uhrhammer, Gruenthal window-based method, and Reasenberg’s approach, and superior performance of the proposed method is found in most cases.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"254 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Epileptic EEG signal classification using an improved VMD-based convolutional stacked autoencoder 使用基于 VMD 的改进型卷积堆叠自动编码器进行癫痫脑电信号分类

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01221-y

Sebamai Parija, Pradipta Kishore Dash, Ranjeeta Bisoi

Numerous techniques have been explored so far for epileptic electroencephalograph (EEG) signal detection and classification. Deep learning-based approaches are in recent demand for data classification with huge features. In this paper, an improved deep learning approach based on convolutional features followed by stacked autoencoder (CSAE) and kernel extreme learning machine (KELM) classifier at the end is proposed for EEG signal classification. The convolutional network extracts initial features by convolution, and after this stage, the features are supplied to stacked autoencoder (SAE) for obtaining final compressed features. These suitable features are then fed to KELM classifier for identifying seizure, seizure-free and healthy EEG signals. The EEG signals are decomposed through chaotic water cycle algorithm-optimised variational mode decomposition (CWCA-OVMD) from which the optimised number of efficient modes is obtained yielding six features like energy, entropy, standard deviation, variance, kurtosis, and skewness. These CWCA-OVMD-based features are then fed to the CSAE for the extraction of relevant features. Once the features are obtained, the KELM classifier is used to classify the EEG signal. The classification results are compared with different deep learning classifiers validating the efficacy of the proposed model. The KELM classifier avoids the choice of hidden neurons in the end layer unlike traditional classifiers which is one of the major advantages.

迄今为止，人们已经探索了许多用于癫痫脑电图（EEG）信号检测和分类的技术。最近，基于深度学习的方法在具有大量特征的数据分类方面受到追捧。本文提出了一种基于卷积特征的改进型深度学习方法，并在最后使用堆叠自动编码器（CSAE）和内核极端学习机（KELM）分类器进行脑电信号分类。卷积网络通过卷积提取初始特征，然后将这些特征提供给堆叠式自动编码器（SAE），以获得最终的压缩特征。然后将这些合适的特征输入 KELM 分类器，以识别癫痫发作、无癫痫发作和健康的脑电信号。通过混沌水循环算法-优化变异模式分解（CWCA-OVMD）对脑电信号进行分解，从中获得优化的有效模式数，产生能量、熵、标准偏差、方差、峰度和偏度等六个特征。然后将这些基于 CWCA-OVMD 的特征输入 CSAE 以提取相关特征。获得特征后，KELM 分类器将用于对 EEG 信号进行分类。分类结果与不同的深度学习分类器进行了比较，验证了所提模型的有效性。与传统分类器不同，KELM 分类器避免了在末端层选择隐藏神经元，这也是其主要优势之一。

{"title":"Epileptic EEG signal classification using an improved VMD-based convolutional stacked autoencoder","authors":"Sebamai Parija, Pradipta Kishore Dash, Ranjeeta Bisoi","doi":"10.1007/s10044-024-01221-y","DOIUrl":"https://doi.org/10.1007/s10044-024-01221-y","url":null,"abstract":"Numerous techniques have been explored so far for epileptic electroencephalograph (EEG) signal detection and classification. Deep learning-based approaches are in recent demand for data classification with huge features. In this paper, an improved deep learning approach based on convolutional features followed by stacked autoencoder (CSAE) and kernel extreme learning machine (KELM) classifier at the end is proposed for EEG signal classification. The convolutional network extracts initial features by convolution, and after this stage, the features are supplied to stacked autoencoder (SAE) for obtaining final compressed features. These suitable features are then fed to KELM classifier for identifying seizure, seizure-free and healthy EEG signals. The EEG signals are decomposed through chaotic water cycle algorithm-optimised variational mode decomposition (CWCA-OVMD) from which the optimised number of efficient modes is obtained yielding six features like energy, entropy, standard deviation, variance, kurtosis, and skewness. These CWCA-OVMD-based features are then fed to the CSAE for the extraction of relevant features. Once the features are obtained, the KELM classifier is used to classify the EEG signal. The classification results are compared with different deep learning classifiers validating the efficacy of the proposed model. The KELM classifier avoids the choice of hidden neurons in the end layer unlike traditional classifiers which is one of the major advantages.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"108 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A detection method for occluded and overlapped apples under close-range targets 近距离目标下遮挡和重叠苹果的检测方法

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-28 DOI: 10.1007/s10044-024-01222-x

Yuhui Yuan, Hubin Liu, Zengrong Yang, Jianhua Zheng, Junhui Li, Longlian Zhao

Accurate and rapid identification and location of apples contributes to speeding up automation harvesting. However, in unstructured orchard environments, it is common for apples to be overlapped and occluded by branches and leaves, which interferes with apple identification and localization. In order to quickly reconstruct the fruits under overlapping and occlusion conditions, an adaptive radius selection strategy based on random sample consensus algorithm (ARSS-RANSAC) was proposed. Firstly, the edge of apple in the image was obtained by using image preprocessing method. Secondly, an adaptive radius selection strategy was proposed, which is based on fruit shape characteristics. The fruit initial radius was obtained through horizontal or vertical scanning. Then, combined with RANSAC algorithm to select effective contour points by the determined radius, and the circle center coordinates were obtained. Finally, fitting the circle according to the selected valid contour and achieving the recognition and localization of overlapped and occluded apples. 175 apple images with different overlaps and branches and leaves occlusion were applied to verify the effectiveness of algorithm. The evaluation indicators of overlap rate, average false-positive rate, average false-negative rate, and average segmentation error of ARSS-RANSAC were improved compared with the classical Hough transform method. The detection time of a single image was less than 50 ms, which can meet requirements of real-time target detection. The experimental results show that the ARSS-RANSAC algorithm can quickly and accurately identify and locate occluded and overlapped apples and is expected to be applied to harvesting robots of apple and other round fruits.

准确、快速地识别和定位苹果有助于加快自动化收获。然而，在非结构化果园环境中，苹果经常会被枝叶重叠和遮挡，从而干扰苹果的识别和定位。为了在重叠和遮挡条件下快速重建果实，提出了一种基于随机样本共识算法（ARSS-RANSAC）的自适应半径选择策略。首先，利用图像预处理方法获得图像中苹果的边缘。其次，提出了基于水果形状特征的自适应半径选择策略。通过水平或垂直扫描获得水果的初始半径。然后，结合 RANSAC 算法，根据确定的半径选择有效的轮廓点，得到圆心坐标。最后，根据选定的有效轮廓拟合圆，实现对重叠和遮挡苹果的识别和定位。为了验证算法的有效性，应用了 175 幅不同重叠度和枝叶遮挡度的苹果图像。与经典的 Hough 变换方法相比，ARSS-RANSAC 的重叠率、平均假阳性率、平均假阴性率和平均分割误差等评价指标均有所提高。单幅图像的检测时间小于 50 毫秒，可以满足实时目标检测的要求。实验结果表明，ARSS-RANSAC 算法能快速准确地识别和定位遮挡和重叠的苹果，有望应用于苹果和其他圆形水果的收割机器人。

{"title":"A detection method for occluded and overlapped apples under close-range targets","authors":"Yuhui Yuan, Hubin Liu, Zengrong Yang, Jianhua Zheng, Junhui Li, Longlian Zhao","doi":"10.1007/s10044-024-01222-x","DOIUrl":"https://doi.org/10.1007/s10044-024-01222-x","url":null,"abstract":"Accurate and rapid identification and location of apples contributes to speeding up automation harvesting. However, in unstructured orchard environments, it is common for apples to be overlapped and occluded by branches and leaves, which interferes with apple identification and localization. In order to quickly reconstruct the fruits under overlapping and occlusion conditions, an adaptive radius selection strategy based on random sample consensus algorithm (ARSS-RANSAC) was proposed. Firstly, the edge of apple in the image was obtained by using image preprocessing method. Secondly, an adaptive radius selection strategy was proposed, which is based on fruit shape characteristics. The fruit initial radius was obtained through horizontal or vertical scanning. Then, combined with RANSAC algorithm to select effective contour points by the determined radius, and the circle center coordinates were obtained. Finally, fitting the circle according to the selected valid contour and achieving the recognition and localization of overlapped and occluded apples. 175 apple images with different overlaps and branches and leaves occlusion were applied to verify the effectiveness of algorithm. The evaluation indicators of overlap rate, average false-positive rate, average false-negative rate, and average segmentation error of ARSS-RANSAC were improved compared with the classical Hough transform method. The detection time of a single image was less than 50 ms, which can meet requirements of real-time target detection. The experimental results show that the ARSS-RANSAC algorithm can quickly and accurately identify and locate occluded and overlapped apples and is expected to be applied to harvesting robots of apple and other round fruits.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"76 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How to track and segment fish without human annotations: a self-supervised deep learning approach 如何在没有人工标注的情况下跟踪和分割鱼类：一种自我监督的深度学习方法

IF 3.9 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Analysis and Applications

Pub Date : 2024-02-23 DOI: 10.1007/s10044-024-01227-6

Alzayat Saleh, Marcus Sheaves, Dean Jerry, Mostafa Rahimi Azghadi

Tracking fish movements and sizes of fish is crucial to understanding their ecology and behaviour. Knowing where fish migrate, how they interact with their environment, and how their size affects their behaviour can help ecologists develop more effective conservation and management strategies to protect fish populations and their habitats. Deep learning is a promising tool to analyse fish ecology from underwater videos. However, training deep neural networks (DNNs) for fish tracking and segmentation requires high-quality labels, which are expensive to obtain. We propose an alternative unsupervised approach that relies on spatial and temporal variations in video data to generate noisy pseudo-ground-truth labels. We train a multi-task DNN using these pseudo-labels. Our framework consists of three stages: (1) an optical flow model generates the pseudo-labels using spatial and temporal consistency between frames, (2) a self-supervised model refines the pseudo-labels incrementally, and (3) a segmentation network uses the refined labels for training. Consequently, we perform extensive experiments to validate our method on three public underwater video datasets and demonstrate its effectiveness for video annotation and segmentation. We also evaluate its robustness to different imaging conditions and discuss its limitations.

跟踪鱼类的活动和大小对了解它们的生态和行为至关重要。了解鱼类洄游到哪里、它们如何与环境互动以及它们的体型如何影响它们的行为，有助于生态学家制定更有效的保护和管理策略，以保护鱼类种群及其栖息地。深度学习是从水下视频中分析鱼类生态的一种很有前途的工具。然而，训练深度神经网络（DNNs）进行鱼类跟踪和分割需要高质量的标签，而获取标签的成本很高。我们提出了另一种无监督方法，即依靠视频数据的空间和时间变化来生成有噪声的伪地面真实标签。我们使用这些伪标签训练多任务 DNN。我们的框架包括三个阶段：(1) 光流模型利用帧间的时空一致性生成伪标签；(2) 自监督模型逐步完善伪标签；(3) 分割网络利用完善的标签进行训练。因此，我们在三个公开的水下视频数据集上进行了大量实验来验证我们的方法，并证明了它在视频标注和分割方面的有效性。我们还评估了该方法在不同成像条件下的鲁棒性，并讨论了其局限性。

{"title":"How to track and segment fish without human annotations: a self-supervised deep learning approach","authors":"Alzayat Saleh, Marcus Sheaves, Dean Jerry, Mostafa Rahimi Azghadi","doi":"10.1007/s10044-024-01227-6","DOIUrl":"https://doi.org/10.1007/s10044-024-01227-6","url":null,"abstract":"Tracking fish movements and sizes of fish is crucial to understanding their ecology and behaviour. Knowing where fish migrate, how they interact with their environment, and how their size affects their behaviour can help ecologists develop more effective conservation and management strategies to protect fish populations and their habitats. Deep learning is a promising tool to analyse fish ecology from underwater videos. However, training deep neural networks (DNNs) for fish tracking and segmentation requires high-quality labels, which are expensive to obtain. We propose an alternative unsupervised approach that relies on spatial and temporal variations in video data to generate noisy pseudo-ground-truth labels. We train a multi-task DNN using these pseudo-labels. Our framework consists of three stages: (1) an optical flow model generates the pseudo-labels using spatial and temporal consistency between frames, (2) a self-supervised model refines the pseudo-labels incrementally, and (3) a segmentation network uses the refined labels for training. Consequently, we perform extensive experiments to validate our method on three public underwater video datasets and demonstrate its effectiveness for video annotation and segmentation. We also evaluate its robustness to different imaging conditions and discuss its limitations.","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139946871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0