International Journal of Machine Learning and Cybernetics最新文献_第6页

Color attention tracking with score matching 色彩注意力跟踪与分数匹配

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-24 DOI: 10.1007/s13042-024-02316-y

Xuedong He, Jiehui Huang

It is an ordinary practice that deep networks are utilized to extract deep features from RGB images. Typically, the popular trackers adopt pre-trained ResNet as a backbone to extract target features, achieving excellent performance. Moreover, Staple has shown that color statistics have complementary cues, while the combination of color statistics and deep features in a unified deep framework has rarely been reported. Therefore, we employ color statistics to construct color attention maps, which are encoded into the deep network to guide the generation of target-aware feature maps. Additionally, DCF-based trackers have an online update module to dynamically update the tracking model, it is particularly necessary to collect reliable target samples. Hence, we refer to the template matching thought to design a score matching method, which is intended to score the tracked targets, this method has the advantage of considering the target extent. In this paper, we conduct sufficient ablation analyses on the color attention module and score matching method to verify their effectiveness. Furthermore, our approaches are combined into the DCF frameworks to construct two brand-new trackers, and both quantitative and qualitative results demonstrate that our trackers can perform favorably against recent and far more sophisticated trackers on multiple public benchmarks.

利用深度网络从 RGB 图像中提取深度特征是一种常见的做法。通常情况下，流行的追踪器采用预训练的 ResNet 作为骨干来提取目标特征，取得了优异的性能。此外，Staple 已经证明颜色统计具有互补线索，而将颜色统计和深度特征结合在统一的深度框架中却鲜有报道。因此，我们采用颜色统计来构建颜色注意力图，并将其编码到深度网络中，以指导目标感知特征图的生成。此外，基于 DCF 的跟踪器有一个在线更新模块来动态更新跟踪模型，这对收集可靠的目标样本尤为必要。因此，我们参考模板匹配思想设计了一种分数匹配方法，旨在对跟踪到的目标进行评分，这种方法的优点是考虑了目标的范围。在本文中，我们对颜色注意模块和分数匹配方法进行了充分的消融分析，以验证它们的有效性。此外，我们还将我们的方法与 DCF 框架相结合，构建了两个全新的跟踪器，定量和定性结果均表明，我们的跟踪器在多个公共基准测试中的表现优于最新的和更复杂的跟踪器。

{"title":"Color attention tracking with score matching","authors":"Xuedong He, Jiehui Huang","doi":"10.1007/s13042-024-02316-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02316-y","url":null,"abstract":"It is an ordinary practice that deep networks are utilized to extract deep features from RGB images. Typically, the popular trackers adopt pre-trained ResNet as a backbone to extract target features, achieving excellent performance. Moreover, Staple has shown that color statistics have complementary cues, while the combination of color statistics and deep features in a unified deep framework has rarely been reported. Therefore, we employ color statistics to construct color attention maps, which are encoded into the deep network to guide the generation of target-aware feature maps. Additionally, DCF-based trackers have an online update module to dynamically update the tracking model, it is particularly necessary to collect reliable target samples. Hence, we refer to the template matching thought to design a score matching method, which is intended to score the tracked targets, this method has the advantage of considering the target extent. In this paper, we conduct sufficient ablation analyses on the color attention module and score matching method to verify their effectiveness. Furthermore, our approaches are combined into the DCF frameworks to construct two brand-new trackers, and both quantitative and qualitative results demonstrate that our trackers can perform favorably against recent and far more sophisticated trackers on multiple public benchmarks.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"62 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dap-SiMT: divergence-based adaptive policy for simultaneous machine translation Dap-SiMT：基于发散的同声机器翻译自适应策略

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-23 DOI: 10.1007/s13042-024-02323-z

Libo Zhao, Ziqian Zeng

In the realm of Simultaneous Machine Translation (SiMT), a robust read/write (R/W) policy is essential alongside a high-quality translation model. Traditional methods typically employ either a fixed wait-k policy in sync with a wait-k translation model or an adaptive policy that is co-developed with a dedicated translation model. This study introduces a more versatile approach by decoupling the adaptive policy from the translation model. Our rationale is based on the finding that an independent multi-path wait-k model, when combined with adaptive policies utilized in advanced SiMT systems, can perform competitively. Specifically, we present DaP, a divergence-based adaptive policy, which dynamically adjusts read/write decisions for any translation model, taking into account potential divergence in translation distributions resulting from future information. Extensive experiments across multiple benchmarks reveal that our method significantly enhances the balance between translation accuracy and latency, surpassing strong baselines.

在同声机器翻译（SiMT）领域，除了高质量的翻译模型外，强大的读/写（R/W）策略也至关重要。传统方法通常采用与等待-k 翻译模型同步的固定等待-k 策略，或与专用翻译模型共同开发的自适应策略。本研究通过将自适应策略与翻译模型分离，引入了一种更通用的方法。我们的理论依据是，我们发现独立的多路径 wait-k 模型与先进 SiMT 系统中使用的自适应策略相结合，可以发挥出更强的竞争力。具体来说，我们提出了基于发散的自适应策略 DaP，它可以动态调整任何翻译模型的读/写决策，同时考虑到未来信息可能导致的翻译分布发散。多个基准的广泛实验表明，我们的方法显著提高了翻译准确性和延迟之间的平衡，超越了强大的基准。

引用次数: 0

TL-LFF Net: transfer learning based lighter, faster, and frozen network for the detection of multi-scale mixed intracranial hemorrhages through genetic optimization algorithm TL-LFF网络：通过遗传优化算法检测多尺度混合颅内出血的基于迁移学习的更轻、更快和冷冻网络

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-23 DOI: 10.1007/s13042-024-02324-y

Lakshmi Prasanna Kothala, Sitaramanjaneya Reddy Guntur

Computed tomography (CT) is the most commonly used imaging method in intracranial hemorrhage (ICH). Although deep learning (DL) models are well suited for detecting and segmenting multi-class hemorrhages, localizing multi-scale mixed hemorrhages with limited resources such as bounding boxes is difficult. To address this issue, the current study proposes a novel transfer learning-based TL-LFF Network. To detect multi-scale mixed hemorrhages, the proposed model employs a backbone module that extracts in-depth features from the input images, and a spatial pyramid pooling faster layer that performs the pooling operation at various levels. In the neck section, a path aggregated network (PANet) is used to store spatial information. Furthermore, to achieve a lightweight nature, the proposed backbone and neck modules were frozen during the backpropagation stage, resulting in a decrease in detection accuracy. To improve detection capability while remaining lightweight, a concept known as transfer learning is used. This strategy significantly improves the accuracy of the proposed model. In addition, the Genetic Algorithm (GA) concept is used to optimize the hyperparameters, where the mutation is used to develop new offspring based on previous generations. The brain hemorrhage extended dataset was used to train and validate the proposed model. In terms of detection metrics and lightweight criteria, the experimental results showed that the proposed model performed better when compared to other existing models. As a result, we can use the proposed model in the clinical implementation stage to reduce the radiologist's CT scan read time.

计算机断层扫描（CT）是颅内出血（ICH）最常用的成像方法。虽然深度学习（DL）模型非常适合检测和分割多类出血，但利用边界框等有限资源定位多尺度混合出血却很困难。为解决这一问题，本研究提出了一种新颖的基于迁移学习的 TL-LFF 网络。为了检测多尺度混合出血，所提出的模型采用了一个骨干模块，从输入图像中提取深度特征，以及一个空间金字塔池化快速层，在不同层次上执行池化操作。在颈部，使用路径聚合网络（PANet）来存储空间信息。此外，为了实现轻量级，所提出的骨干和颈部模块在反向传播阶段被冻结，导致检测精度下降。为了在保持轻量级的同时提高检测能力，我们采用了一种称为迁移学习的概念。这一策略极大地提高了拟议模型的准确性。此外，遗传算法（GA）概念用于优化超参数，其中突变用于在前几代的基础上开发新的后代。脑出血扩展数据集被用来训练和验证所提出的模型。在检测指标和轻量级标准方面，实验结果表明，与其他现有模型相比，提出的模型表现更好。因此，我们可以在临床实施阶段使用所提出的模型，以减少放射科医生的 CT 扫描读取时间。

{"title":"TL-LFF Net: transfer learning based lighter, faster, and frozen network for the detection of multi-scale mixed intracranial hemorrhages through genetic optimization algorithm","authors":"Lakshmi Prasanna Kothala, Sitaramanjaneya Reddy Guntur","doi":"10.1007/s13042-024-02324-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02324-y","url":null,"abstract":"Computed tomography (CT) is the most commonly used imaging method in intracranial hemorrhage (ICH). Although deep learning (DL) models are well suited for detecting and segmenting multi-class hemorrhages, localizing multi-scale mixed hemorrhages with limited resources such as bounding boxes is difficult. To address this issue, the current study proposes a novel transfer learning-based TL-LFF Network. To detect multi-scale mixed hemorrhages, the proposed model employs a backbone module that extracts in-depth features from the input images, and a spatial pyramid pooling faster layer that performs the pooling operation at various levels. In the neck section, a path aggregated network (PANet) is used to store spatial information. Furthermore, to achieve a lightweight nature, the proposed backbone and neck modules were frozen during the backpropagation stage, resulting in a decrease in detection accuracy. To improve detection capability while remaining lightweight, a concept known as transfer learning is used. This strategy significantly improves the accuracy of the proposed model. In addition, the Genetic Algorithm (GA) concept is used to optimize the hyperparameters, where the mutation is used to develop new offspring based on previous generations. The brain hemorrhage extended dataset was used to train and validate the proposed model. In terms of detection metrics and lightweight criteria, the experimental results showed that the proposed model performed better when compared to other existing models. As a result, we can use the proposed model in the clinical implementation stage to reduce the radiologist's CT scan read time.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"46 2 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A hierarchical dual-view model for fake news detection guided by discriminative lexicons 以判别词典为指导的分层双视角假新闻检测模型

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-23 DOI: 10.1007/s13042-024-02322-0

Sijia Yang, Xianyong Li, Yajun Du, Dong Huang, Xiaoliang Chen, Yongquan Fan, Shumin Wang

Fake news detection aims to automatically identify the credibility of source posts, mitigating potential societal harm and conserving human resources. Textual fake news detection methods can be categorized into pattern- and fact-based. Pattern-based models focus on identifying shared writing patterns in source posts, while fact-based models leverage auxiliary external knowledge. Researchers have recently attempted to merge these two views into a comprehensive detection system, achieving superior performance to single-view methods. However, existing dual-view methods often prioritize integrating single-view methods over exploring nuanced characteristics of both perspectives. To address this, we propose a novel hierarchical dual-view model for fake news detection guided by discriminative lexicons. First, we construct two lexicons based on distinct word usage tendencies in fake and real news and further augment them with synonyms sourced from large language models. We then devise a hierarchical attention network to derive semantic representations for the source post, incorporating a lexicon attention loss to guide the prioritization of words from the two lexicons. Subsequently, a lexicon-guided interaction network is employed to model the relations between the source post and its relevant articles, assigning authenticity-aware weights to each article. Finally, the representations of source post and relevant articles are concatenated for joint detection. According to experimental results, our model outperforms many competitive baselines in terms of the macro F1 score ranging from 1.1% to 10.5% on Weibo and from 3.2% to 10.8% on Twitter.

假新闻检测旨在自动识别来源帖子的可信度，减轻潜在的社会危害并节约人力资源。文本假新闻检测方法可分为基于模式和基于事实两种。基于模式的模型侧重于识别源帖子中的共同写作模式，而基于事实的模型则利用辅助的外部知识。最近，研究人员尝试将这两种视角融合到一个综合检测系统中，取得了优于单一视角方法的性能。然而，现有的双视角方法往往优先考虑整合单视角方法，而不是探索两种视角的细微特征。为了解决这个问题，我们提出了一种新颖的分层双视角模型，用于以判别词典为指导的假新闻检测。首先，我们根据虚假新闻和真实新闻中不同的用词倾向构建了两个词典，并通过大型语言模型中的同义词对其进行进一步扩充。然后，我们设计了一个分层注意力网络来推导源帖子的语义表征，并结合词典注意力损失来指导两个词典中词语的优先排序。随后，词典指导的交互网络被用来模拟源帖子与其相关文章之间的关系，并为每篇文章分配真实性感知权重。最后，源文章和相关文章的表征被串联起来进行联合检测。根据实验结果，我们的模型在微博上的宏观 F1 得分从 1.1% 到 10.5%，在 Twitter 上的宏观 F1 得分从 3.2% 到 10.8%，均优于许多竞争基线模型。

{"title":"A hierarchical dual-view model for fake news detection guided by discriminative lexicons","authors":"Sijia Yang, Xianyong Li, Yajun Du, Dong Huang, Xiaoliang Chen, Yongquan Fan, Shumin Wang","doi":"10.1007/s13042-024-02322-0","DOIUrl":"https://doi.org/10.1007/s13042-024-02322-0","url":null,"abstract":"Fake news detection aims to automatically identify the credibility of source posts, mitigating potential societal harm and conserving human resources. Textual fake news detection methods can be categorized into pattern- and fact-based. Pattern-based models focus on identifying shared writing patterns in source posts, while fact-based models leverage auxiliary external knowledge. Researchers have recently attempted to merge these two views into a comprehensive detection system, achieving superior performance to single-view methods. However, existing dual-view methods often prioritize integrating single-view methods over exploring nuanced characteristics of both perspectives. To address this, we propose a novel hierarchical dual-view model for fake news detection guided by discriminative lexicons. First, we construct two lexicons based on distinct word usage tendencies in fake and real news and further augment them with synonyms sourced from large language models. We then devise a hierarchical attention network to derive semantic representations for the source post, incorporating a lexicon attention loss to guide the prioritization of words from the two lexicons. Subsequently, a lexicon-guided interaction network is employed to model the relations between the source post and its relevant articles, assigning authenticity-aware weights to each article. Finally, the representations of source post and relevant articles are concatenated for joint detection. According to experimental results, our model outperforms many competitive baselines in terms of the macro F1 score ranging from 1.1% to 10.5% on Weibo and from 3.2% to 10.8% on Twitter.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"35 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extraction of entity relationships serving the field of agriculture food safety regulation 提取农业食品安全监管领域的实体关系

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-21 DOI: 10.1007/s13042-024-02304-2

Zhihua Zhao, Yiming Liu, Dongdong Lv, Ruixuan Li, Xudong Yu, Dianhui Mao

Agriculture food (agri-food) safety is closely related to all aspects of people's lives. In recent years, with the emergence of deep learning technology based on big data, the extraction of information relations in the field of agri-food safety supervision has become a research hotspot. However, most of the current work only expands the relationship recognition based on the traditional named entity recognition task, which makes it difficult to establish a true 'connection' between entities and relationships. The pipelined and federated extraction architectures that have emerged in this area are problematic in practice. In addition, the contextual information of the text corpus in the agri-food safety regulatory domain has not been fully utilized. To address the above issues, this paper proposes a semi-joint entity relationship extraction model (EB-SJRE) based on contextual entity boundary features. Firstly, a Token pair subject-object correspondence matrix label is designed to intuitively model the subject-object boundary, which is more friendly to complex entities in the field of agri-food safety regulation. Secondly, the dynamic fine-tuning of Bert makes the text embedding more relevant to the textual context of the agri-food safety regulation domain. Finally, we introduce an attention mechanism in the Token pair tagging framework to capture deep semantic subject-object boundary association information, which cleverly solves the problem of bias exposure due to the pipeline structure and the dimensional explosion due to the joint extraction structure. The experimental results show that our model achieves the best F1-score of 88.71% on agri-food safety regulation domain data and F1-scores of 92.36%, 92.80%, 88.91%, and 92.21% on NYT, NYT-star, WebNLG, and WebNLG-star, respectively. This indicates that EB-SJRE has excellent generalization ability in both the agri-food safety regulatory and public sectors.

农业食品（农食）安全与人们生活的方方面面息息相关。近年来，随着基于大数据的深度学习技术的兴起，农业食品安全监管领域的信息关系提取成为研究热点。然而，目前大多数工作只是在传统命名实体识别任务的基础上拓展关系识别，难以建立实体与关系之间真正的 "联系"。该领域出现的流水线式和联合式提取架构在实践中存在问题。此外，农业食品安全监管领域文本语料库的上下文信息也没有得到充分利用。针对上述问题，本文提出了一种基于上下文实体边界特征的半联合实体关系提取模型（EB-SJRE）。首先，设计了 Token 对主客体对应矩阵标签，直观地建立主客体边界模型，对农业食品安全监管领域的复杂实体更加友好。其次，Bert 的动态微调使文本嵌入更贴近农业食品安全监管领域的文本语境。最后，我们在 Token 对标记框架中引入了注意力机制，捕捉深层语义主客体边界关联信息，巧妙地解决了管道结构带来的偏差暴露问题和联合提取结构带来的维度爆炸问题。实验结果表明，我们的模型在农业食品安全监管领域数据上取得了88.71%的最佳F1分数，在NYT、NYT-star、WebNLG和WebNLG-star上的F1分数分别为92.36%、92.80%、88.91%和92.21%。这表明 EB-SJRE 在农业食品安全监管和公共领域都具有出色的泛化能力。

{"title":"Extraction of entity relationships serving the field of agriculture food safety regulation","authors":"Zhihua Zhao, Yiming Liu, Dongdong Lv, Ruixuan Li, Xudong Yu, Dianhui Mao","doi":"10.1007/s13042-024-02304-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02304-2","url":null,"abstract":"Agriculture food (agri-food) safety is closely related to all aspects of people's lives. In recent years, with the emergence of deep learning technology based on big data, the extraction of information relations in the field of agri-food safety supervision has become a research hotspot. However, most of the current work only expands the relationship recognition based on the traditional named entity recognition task, which makes it difficult to establish a true 'connection' between entities and relationships. The pipelined and federated extraction architectures that have emerged in this area are problematic in practice. In addition, the contextual information of the text corpus in the agri-food safety regulatory domain has not been fully utilized. To address the above issues, this paper proposes a semi-joint entity relationship extraction model (EB-SJRE) based on contextual entity boundary features. Firstly, a Token pair subject-object correspondence matrix label is designed to intuitively model the subject-object boundary, which is more friendly to complex entities in the field of agri-food safety regulation. Secondly, the dynamic fine-tuning of Bert makes the text embedding more relevant to the textual context of the agri-food safety regulation domain. Finally, we introduce an attention mechanism in the Token pair tagging framework to capture deep semantic subject-object boundary association information, which cleverly solves the problem of bias exposure due to the pipeline structure and the dimensional explosion due to the joint extraction structure. The experimental results show that our model achieves the best F1-score of 88.71% on agri-food safety regulation domain data and F1-scores of 92.36%, 92.80%, 88.91%, and 92.21% on NYT, NYT-star, WebNLG, and WebNLG-star, respectively. This indicates that EB-SJRE has excellent generalization ability in both the agri-food safety regulatory and public sectors.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"28 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anchor-based Domain Adaptive Hashing for unsupervised image retrieval 基于锚点的无监督图像检索领域自适应哈希算法

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-21 DOI: 10.1007/s13042-024-02298-x

Yonghao Chen, Xiaozhao Fang, Yuanyuan Liu, Xi Hu, Na Han, Peipei Kang

Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.

传统的图像检索方法在目标数据集上训练模型后，再在另一个数据集上运行时，性能会明显下降。为解决这一问题，领域自适应检索（DAR）成为一种很有前途的解决方案，专门用于克服检索任务中的领域偏移。然而，现有的无监督 DAR 方法仍然面临两个主要限制：(1) 它们对领域之间的内在结构探索不足，导致泛化能力有限；(2) 模型通常过于复杂，无法应用于大规模数据集。为了解决这些局限性，我们提出了一种新型无监督 DAR 方法，名为基于锚点的域自适应散列（ADAH）。ADAH 的目的是利用域之间的共性，并假设源域和目标域存在一致的潜在空间。为此，提出了一种基于锚的相似性重构方案，该方案学习一组域共享锚和特定域的锚图，然后用这些锚图重构相似性矩阵，从而有效地利用域间和域内的相似性结构。随后，通过将锚图视为特征嵌入，我们解决了锚图与相应哈希代码之间的距离差最小化（DDDM）问题。这就保留了哈希代码中相似性矩阵的相似性结构。最后，我们采用两阶段策略推导哈希函数，确保其有效性和可扩展性。在四个数据集上的实验结果证明了所提方法的有效性。

{"title":"Anchor-based Domain Adaptive Hashing for unsupervised image retrieval","authors":"Yonghao Chen, Xiaozhao Fang, Yuanyuan Liu, Xi Hu, Na Han, Peipei Kang","doi":"10.1007/s13042-024-02298-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02298-x","url":null,"abstract":"Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"32 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Single-stage zero-shot object detection network based on CLIP and pseudo-labeling 基于 CLIP 和伪标记的单级零镜头物体检测网络

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02321-1

Jiafeng Li, Shengyao Sun, Kang Zhang, Jing Zhang, Li Zhuo

The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model’s ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model’s ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.

未知物体的检测是计算机视觉领域的一项具有挑战性的任务，因为尽管真实世界的检测物体类别多种多样，但现有的物体检测训练集所涵盖的物体类别数量有限。大多数现有方法使用两级网络来提高模型描述未知类别物体的能力，这导致推理速度缓慢。为了解决这个问题，我们提出了一种基于对比语言-图像预训练（CLIP）模型和伪标签的单阶段未知物体检测方法，称为 CLIP-YOLO。首先，引入视觉语言嵌入对齐方法，并在 YOLO 系列检测头和特征增强组件中嵌入通道分组增强坐标注意模块，以提高模型对未知类别物体的特征描述和检测能力。其次，在 CLIP 模型的基础上优化了伪标签生成，以扩大训练集的多样性，提高覆盖未知物体类别的能力。我们在四个具有挑战性的数据集上验证了这一方法：这四个数据集是：MSCOCO、ILSVRC、Visual Genome 和 PASCAL VOC。结果表明，我们的方法可以达到更高的准确率和更快的速度，从而获得更好的未知物体检测性能。源代码见 https://github.com/BJUTsipl/CLIP-YOLO。

{"title":"Single-stage zero-shot object detection network based on CLIP and pseudo-labeling","authors":"Jiafeng Li, Shengyao Sun, Kang Zhang, Jing Zhang, Li Zhuo","doi":"10.1007/s13042-024-02321-1","DOIUrl":"https://doi.org/10.1007/s13042-024-02321-1","url":null,"abstract":"The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model’s ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model’s ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"41 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing automated street crime detection: a drone-based system integrating CNN models and enhanced feature selection techniques 推进街头犯罪自动检测：基于无人机的系统集成了 CNN 模型和增强型特征选择技术

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02315-z

Lakshma Reddy Vuyyuru, NagaMalleswara Rao Purimetla, Kancharakunt Yakub Reddy, Sai Srinivas Vellela, Sk Khader Basha, Ramesh Vatambeti

This study presents a pioneering solution to the growing challenge of escalating global crime rates through the introduction of an automated drone-based street crime detection system. Leveraging advanced Convolutional Neural Network (CNN) models, the system integrates several key components for analyzing images captured by drones. Initially, the Embedding Bilateral Filter (EBF) technique divides images into base and detail layers to enhance detection accuracy. The fusion model, IR with attention-based Conv-ViT, combines Inception-V3, ResNet-50, and Convolution Vision Transformer (Conv-ViT) to capture both shape and texture details efficiently. Further enhancement is achieved through the Improved Shark Smell Optimization Algorithm (ISSOA), which optimizes feature selection and minimizes redundancy in image extraction. Additionally, a Multi-scale Contextual Semantic Guidance Network (MCS-GNet) ensures robust image classification by integrating features from multiple layers to prevent data loss. Evaluation on the UCF-Crime and UCSD Ped2 datasets demonstrates superior accuracy, with remarkable results of 0.783 and 0.974, respectively. This innovative approach offers a promising solution to the arduous and continuous task of monitoring security camera feeds for suspicious activities, thereby addressing the pressing need for automated crime detection systems on a global scale.

针对全球犯罪率不断攀升这一日益严峻的挑战，本研究提出了一个开创性的解决方案，即引入基于无人机的街头犯罪自动探测系统。利用先进的卷积神经网络（CNN）模型，该系统集成了几个关键组件，用于分析无人机捕获的图像。首先，嵌入式双边滤波器（EBF）技术将图像分为基础层和细节层，以提高检测精度。融合模型 IR 与基于注意力的 Conv-ViT 结合了 Inception-V3、ResNet-50 和 Convolution Vision Transformer（Conv-ViT），可有效捕捉形状和纹理细节。改进的鲨鱼嗅觉优化算法（ISSOA）优化了特征选择，最大限度地减少了图像提取中的冗余，从而进一步提高了效果。此外，多尺度上下文语义指导网络（MCS-GNet）通过整合多层特征来防止数据丢失，从而确保图像分类的稳健性。在 UCF-Crime 和 UCSD Ped2 数据集上进行的评估证明了其卓越的准确性，结果分别为 0.783 和 0.974。这种创新方法为监控安防摄像头画面以发现可疑活动这一艰巨而持续的任务提供了一种前景广阔的解决方案，从而满足了全球范围内对自动犯罪检测系统的迫切需求。

{"title":"Advancing automated street crime detection: a drone-based system integrating CNN models and enhanced feature selection techniques","authors":"Lakshma Reddy Vuyyuru, NagaMalleswara Rao Purimetla, Kancharakunt Yakub Reddy, Sai Srinivas Vellela, Sk Khader Basha, Ramesh Vatambeti","doi":"10.1007/s13042-024-02315-z","DOIUrl":"https://doi.org/10.1007/s13042-024-02315-z","url":null,"abstract":"This study presents a pioneering solution to the growing challenge of escalating global crime rates through the introduction of an automated drone-based street crime detection system. Leveraging advanced Convolutional Neural Network (CNN) models, the system integrates several key components for analyzing images captured by drones. Initially, the Embedding Bilateral Filter (EBF) technique divides images into base and detail layers to enhance detection accuracy. The fusion model, IR with attention-based Conv-ViT, combines Inception-V3, ResNet-50, and Convolution Vision Transformer (Conv-ViT) to capture both shape and texture details efficiently. Further enhancement is achieved through the Improved Shark Smell Optimization Algorithm (ISSOA), which optimizes feature selection and minimizes redundancy in image extraction. Additionally, a Multi-scale Contextual Semantic Guidance Network (MCS-GNet) ensures robust image classification by integrating features from multiple layers to prevent data loss. Evaluation on the UCF-Crime and UCSD Ped2 datasets demonstrates superior accuracy, with remarkable results of 0.783 and 0.974, respectively. This innovative approach offers a promising solution to the arduous and continuous task of monitoring security camera feeds for suspicious activities, thereby addressing the pressing need for automated crime detection systems on a global scale.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ETCGN: entity type-constrained graph networks for document-level relation extraction ETCGN：用于文档级关系提取的实体类型受限图网络

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02293-2

Hangxiao Yang, Changpu Chen, Shaokai Zhang, Baiyang Chen, Chang Liu, Qilin Li

Document-level relation extraction aims at discerning semantic connections between entities within a given document. Compared with sentence-level relation extraction settings, the complexity of document-level relation extraction lies in necessitating models to exhibit the capability to infer semantic relations across multiple sentences. In this paper, we propose a novel model, named Entity Type-Constrained Graph Network (ETCGN). The proposed model utilizes a graph structure to capture intricate interactions among diverse mentions within the document. Moreover, it aggregates references to the same entity while integrating path-based reasoning mechanisms to deduce relations between entities. Furthermore, we present a novel constraint method that capitalizes on entity types to confine the scope of potential relations. Experimental results on two public dataset (DocRED and HacRED) show that our model outperforms a number of baselines and achieves state-of-the-art performance. Further analysis verifies the effectiveness of type-based constraints and path-based reasoning mechanisms. Our code is available at: https://github.com/yhx30/ETCGN.

文档级关系提取旨在辨别给定文档中实体之间的语义联系。与句子级关系抽取设置相比，文档级关系抽取的复杂性在于要求模型能够推断出多个句子之间的语义关系。本文提出了一种名为 "实体类型约束图网络（ETCGN）"的新型模型。该模型利用图结构来捕捉文档中不同提及之间错综复杂的交互关系。此外，它还聚合了对同一实体的引用，同时整合了基于路径的推理机制来推断实体之间的关系。此外，我们还提出了一种新颖的约束方法，利用实体类型来限制潜在关系的范围。在两个公共数据集（DocRED 和 HacRED）上的实验结果表明，我们的模型优于一些基线模型，达到了最先进的性能。进一步的分析验证了基于类型的约束和基于路径的推理机制的有效性。我们的代码可在以下网址获取：https://github.com/yhx30/ETCGN。

{"title":"ETCGN: entity type-constrained graph networks for document-level relation extraction","authors":"Hangxiao Yang, Changpu Chen, Shaokai Zhang, Baiyang Chen, Chang Liu, Qilin Li","doi":"10.1007/s13042-024-02293-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02293-2","url":null,"abstract":"Document-level relation extraction aims at discerning semantic connections between entities within a given document. Compared with sentence-level relation extraction settings, the complexity of document-level relation extraction lies in necessitating models to exhibit the capability to infer semantic relations across multiple sentences. In this paper, we propose a novel model, named Entity Type-Constrained Graph Network (ETCGN). The proposed model utilizes a graph structure to capture intricate interactions among diverse mentions within the document. Moreover, it aggregates references to the same entity while integrating path-based reasoning mechanisms to deduce relations between entities. Furthermore, we present a novel constraint method that capitalizes on entity types to confine the scope of potential relations. Experimental results on two public dataset (DocRED and HacRED) show that our model outperforms a number of baselines and achieves state-of-the-art performance. Further analysis verifies the effectiveness of type-based constraints and path-based reasoning mechanisms. Our code is available at: https://github.com/yhx30/ETCGN.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"58 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint feature fusion hashing for cross-modal retrieval 用于跨模态检索的联合特征融合哈希算法

IF 5.6 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Machine Learning and Cybernetics

Pub Date : 2024-08-20 DOI: 10.1007/s13042-024-02309-x

Yuxia Cao

Cross-modal hashing retrieval maps data from different modalities into a common low-dimensional hash code space, enabling fast and efficient retrieval. Recently, there has been a growing interest in the cross-modal hashing retrieval approach. Nonetheless, a significant number of current methodologies overlook the influence of semantically rich features on retrieval performance. In addition, class attribute embedding is often forgotten in cross-modal feature fusion, which is crucial for learning more discriminative hash codes. To meet these challenges, we put forward a novel method, namely joint feature fusion hashing (JFFH) for cross-modal retrieval. Specifically, we use the fast language image pre-training model as the feature coding module of cross-modal data. To more effectively mitigate semantic disparities between modalities, we introduce a multimodal contrastive learning loss to strengthen the interaction between modalities and improve the semantic representation of modalities. In addition, we extract class attribute features as class embedding and integrate them with cross-modal features to enhance the semantic relationship within the fused features. To better capture both inter-modal and intra-modal dependencies as well as semantic relevance, we integrate the self-attention mechanism into the multi-modal fusion transformer encoder to facilitate efficient feature fusion. Besides, we apply label-wise high-level semantic similarity and feature-wise low-level semantic similarity to enhance the discrimination of hash codes. Our JFFH method shows better retrieval performance in large-scale cross-modal retrieval.

跨模态散列检索可将不同模态的数据映射到一个共同的低维散列码空间，从而实现快速高效的检索。最近，人们对跨模态哈希检索方法的兴趣与日俱增。然而，目前相当多的方法忽略了语义丰富的特征对检索性能的影响。此外，在跨模态特征融合中，类属性嵌入往往被遗忘，而这对于学习更具区分度的哈希代码至关重要。为了应对这些挑战，我们提出了一种新方法，即用于跨模态检索的联合特征融合散列（JFFH）。具体来说，我们使用快速语言图像预训练模型作为跨模态数据的特征编码模块。为了更有效地缓解模态之间的语义差异，我们引入了多模态对比学习损失，以加强模态之间的交互，改善模态的语义表征。此外，我们提取类属性特征作为类嵌入，并将其与跨模态特征进行整合，以增强融合特征内部的语义关系。为了更好地捕捉模态间和模态内的依赖关系以及语义相关性，我们在多模态融合转换器编码器中集成了自注意机制，以促进高效的特征融合。此外，我们还应用了标签意义上的高级语义相似性和特征意义上的低级语义相似性来提高哈希代码的辨别能力。我们的 JFFH 方法在大规模跨模态检索中表现出了更好的检索性能。

{"title":"Joint feature fusion hashing for cross-modal retrieval","authors":"Yuxia Cao","doi":"10.1007/s13042-024-02309-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02309-x","url":null,"abstract":"Cross-modal hashing retrieval maps data from different modalities into a common low-dimensional hash code space, enabling fast and efficient retrieval. Recently, there has been a growing interest in the cross-modal hashing retrieval approach. Nonetheless, a significant number of current methodologies overlook the influence of semantically rich features on retrieval performance. In addition, class attribute embedding is often forgotten in cross-modal feature fusion, which is crucial for learning more discriminative hash codes. To meet these challenges, we put forward a novel method, namely joint feature fusion hashing (JFFH) for cross-modal retrieval. Specifically, we use the fast language image pre-training model as the feature coding module of cross-modal data. To more effectively mitigate semantic disparities between modalities, we introduce a multimodal contrastive learning loss to strengthen the interaction between modalities and improve the semantic representation of modalities. In addition, we extract class attribute features as class embedding and integrate them with cross-modal features to enhance the semantic relationship within the fused features. To better capture both inter-modal and intra-modal dependencies as well as semantic relevance, we integrate the self-attention mechanism into the multi-modal fusion transformer encoder to facilitate efficient feature fusion. Besides, we apply label-wise high-level semantic similarity and feature-wise low-level semantic similarity to enhance the discrimination of hash codes. Our JFFH method shows better retrieval performance in large-scale cross-modal retrieval.","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"7 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0