Pub Date : 2024-08-24DOI: 10.1007/s13042-024-02316-y
Xuedong He, Jiehui Huang
It is an ordinary practice that deep networks are utilized to extract deep features from RGB images. Typically, the popular trackers adopt pre-trained ResNet as a backbone to extract target features, achieving excellent performance. Moreover, Staple has shown that color statistics have complementary cues, while the combination of color statistics and deep features in a unified deep framework has rarely been reported. Therefore, we employ color statistics to construct color attention maps, which are encoded into the deep network to guide the generation of target-aware feature maps. Additionally, DCF-based trackers have an online update module to dynamically update the tracking model, it is particularly necessary to collect reliable target samples. Hence, we refer to the template matching thought to design a score matching method, which is intended to score the tracked targets, this method has the advantage of considering the target extent. In this paper, we conduct sufficient ablation analyses on the color attention module and score matching method to verify their effectiveness. Furthermore, our approaches are combined into the DCF frameworks to construct two brand-new trackers, and both quantitative and qualitative results demonstrate that our trackers can perform favorably against recent and far more sophisticated trackers on multiple public benchmarks.
{"title":"Color attention tracking with score matching","authors":"Xuedong He, Jiehui Huang","doi":"10.1007/s13042-024-02316-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02316-y","url":null,"abstract":"<p>It is an ordinary practice that deep networks are utilized to extract deep features from RGB images. Typically, the popular trackers adopt pre-trained ResNet as a backbone to extract target features, achieving excellent performance. Moreover, Staple has shown that color statistics have complementary cues, while the combination of color statistics and deep features in a unified deep framework has rarely been reported. Therefore, we employ color statistics to construct color attention maps, which are encoded into the deep network to guide the generation of target-aware feature maps. Additionally, DCF-based trackers have an online update module to dynamically update the tracking model, it is particularly necessary to collect reliable target samples. Hence, we refer to the template matching thought to design a score matching method, which is intended to score the tracked targets, this method has the advantage of considering the target extent. In this paper, we conduct sufficient ablation analyses on the color attention module and score matching method to verify their effectiveness. Furthermore, our approaches are combined into the DCF frameworks to construct two brand-new trackers, and both quantitative and qualitative results demonstrate that our trackers can perform favorably against recent and far more sophisticated trackers on multiple public benchmarks.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"62 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1007/s13042-024-02323-z
Libo Zhao, Ziqian Zeng
In the realm of Simultaneous Machine Translation (SiMT), a robust read/write (R/W) policy is essential alongside a high-quality translation model. Traditional methods typically employ either a fixed wait-k policy in sync with a wait-k translation model or an adaptive policy that is co-developed with a dedicated translation model. This study introduces a more versatile approach by decoupling the adaptive policy from the translation model. Our rationale is based on the finding that an independent multi-path wait-k model, when combined with adaptive policies utilized in advanced SiMT systems, can perform competitively. Specifically, we present DaP, a divergence-based adaptive policy, which dynamically adjusts read/write decisions for any translation model, taking into account potential divergence in translation distributions resulting from future information. Extensive experiments across multiple benchmarks reveal that our method significantly enhances the balance between translation accuracy and latency, surpassing strong baselines.
{"title":"Dap-SiMT: divergence-based adaptive policy for simultaneous machine translation","authors":"Libo Zhao, Ziqian Zeng","doi":"10.1007/s13042-024-02323-z","DOIUrl":"https://doi.org/10.1007/s13042-024-02323-z","url":null,"abstract":"<p>In the realm of Simultaneous Machine Translation (SiMT), a robust read/write (R/W) policy is essential alongside a high-quality translation model. Traditional methods typically employ either a fixed wait-<i>k</i> policy in sync with a wait-<i>k</i> translation model or an adaptive policy that is co-developed with a dedicated translation model. This study introduces a more versatile approach by decoupling the adaptive policy from the translation model. Our rationale is based on the finding that an independent multi-path wait-<i>k</i> model, when combined with adaptive policies utilized in advanced SiMT systems, can perform competitively. Specifically, we present DaP, a divergence-based adaptive policy, which dynamically adjusts read/write decisions for any translation model, taking into account potential divergence in translation distributions resulting from future information. Extensive experiments across multiple benchmarks reveal that our method significantly enhances the balance between translation accuracy and latency, surpassing strong baselines.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"270 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Computed tomography (CT) is the most commonly used imaging method in intracranial hemorrhage (ICH). Although deep learning (DL) models are well suited for detecting and segmenting multi-class hemorrhages, localizing multi-scale mixed hemorrhages with limited resources such as bounding boxes is difficult. To address this issue, the current study proposes a novel transfer learning-based TL-LFF Network. To detect multi-scale mixed hemorrhages, the proposed model employs a backbone module that extracts in-depth features from the input images, and a spatial pyramid pooling faster layer that performs the pooling operation at various levels. In the neck section, a path aggregated network (PANet) is used to store spatial information. Furthermore, to achieve a lightweight nature, the proposed backbone and neck modules were frozen during the backpropagation stage, resulting in a decrease in detection accuracy. To improve detection capability while remaining lightweight, a concept known as transfer learning is used. This strategy significantly improves the accuracy of the proposed model. In addition, the Genetic Algorithm (GA) concept is used to optimize the hyperparameters, where the mutation is used to develop new offspring based on previous generations. The brain hemorrhage extended dataset was used to train and validate the proposed model. In terms of detection metrics and lightweight criteria, the experimental results showed that the proposed model performed better when compared to other existing models. As a result, we can use the proposed model in the clinical implementation stage to reduce the radiologist's CT scan read time.
{"title":"TL-LFF Net: transfer learning based lighter, faster, and frozen network for the detection of multi-scale mixed intracranial hemorrhages through genetic optimization algorithm","authors":"Lakshmi Prasanna Kothala, Sitaramanjaneya Reddy Guntur","doi":"10.1007/s13042-024-02324-y","DOIUrl":"https://doi.org/10.1007/s13042-024-02324-y","url":null,"abstract":"<p>Computed tomography (CT) is the most commonly used imaging method in intracranial hemorrhage (ICH). Although deep learning (DL) models are well suited for detecting and segmenting multi-class hemorrhages, localizing multi-scale mixed hemorrhages with limited resources such as bounding boxes is difficult. To address this issue, the current study proposes a novel transfer learning-based TL-LFF Network. To detect multi-scale mixed hemorrhages, the proposed model employs a backbone module that extracts in-depth features from the input images, and a spatial pyramid pooling faster layer that performs the pooling operation at various levels. In the neck section, a path aggregated network (PANet) is used to store spatial information. Furthermore, to achieve a lightweight nature, the proposed backbone and neck modules were frozen during the backpropagation stage, resulting in a decrease in detection accuracy. To improve detection capability while remaining lightweight, a concept known as transfer learning is used. This strategy significantly improves the accuracy of the proposed model. In addition, the Genetic Algorithm (GA) concept is used to optimize the hyperparameters, where the mutation is used to develop new offspring based on previous generations. The brain hemorrhage extended dataset was used to train and validate the proposed model. In terms of detection metrics and lightweight criteria, the experimental results showed that the proposed model performed better when compared to other existing models. As a result, we can use the proposed model in the clinical implementation stage to reduce the radiologist's CT scan read time.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"46 2 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-23DOI: 10.1007/s13042-024-02322-0
Sijia Yang, Xianyong Li, Yajun Du, Dong Huang, Xiaoliang Chen, Yongquan Fan, Shumin Wang
Fake news detection aims to automatically identify the credibility of source posts, mitigating potential societal harm and conserving human resources. Textual fake news detection methods can be categorized into pattern- and fact-based. Pattern-based models focus on identifying shared writing patterns in source posts, while fact-based models leverage auxiliary external knowledge. Researchers have recently attempted to merge these two views into a comprehensive detection system, achieving superior performance to single-view methods. However, existing dual-view methods often prioritize integrating single-view methods over exploring nuanced characteristics of both perspectives. To address this, we propose a novel hierarchical dual-view model for fake news detection guided by discriminative lexicons. First, we construct two lexicons based on distinct word usage tendencies in fake and real news and further augment them with synonyms sourced from large language models. We then devise a hierarchical attention network to derive semantic representations for the source post, incorporating a lexicon attention loss to guide the prioritization of words from the two lexicons. Subsequently, a lexicon-guided interaction network is employed to model the relations between the source post and its relevant articles, assigning authenticity-aware weights to each article. Finally, the representations of source post and relevant articles are concatenated for joint detection. According to experimental results, our model outperforms many competitive baselines in terms of the macro F1 score ranging from 1.1% to 10.5% on Weibo and from 3.2% to 10.8% on Twitter.
假新闻检测旨在自动识别来源帖子的可信度,减轻潜在的社会危害并节约人力资源。文本假新闻检测方法可分为基于模式和基于事实两种。基于模式的模型侧重于识别源帖子中的共同写作模式,而基于事实的模型则利用辅助的外部知识。最近,研究人员尝试将这两种视角融合到一个综合检测系统中,取得了优于单一视角方法的性能。然而,现有的双视角方法往往优先考虑整合单视角方法,而不是探索两种视角的细微特征。为了解决这个问题,我们提出了一种新颖的分层双视角模型,用于以判别词典为指导的假新闻检测。首先,我们根据虚假新闻和真实新闻中不同的用词倾向构建了两个词典,并通过大型语言模型中的同义词对其进行进一步扩充。然后,我们设计了一个分层注意力网络来推导源帖子的语义表征,并结合词典注意力损失来指导两个词典中词语的优先排序。随后,词典指导的交互网络被用来模拟源帖子与其相关文章之间的关系,并为每篇文章分配真实性感知权重。最后,源文章和相关文章的表征被串联起来进行联合检测。根据实验结果,我们的模型在微博上的宏观 F1 得分从 1.1% 到 10.5%,在 Twitter 上的宏观 F1 得分从 3.2% 到 10.8%,均优于许多竞争基线模型。
{"title":"A hierarchical dual-view model for fake news detection guided by discriminative lexicons","authors":"Sijia Yang, Xianyong Li, Yajun Du, Dong Huang, Xiaoliang Chen, Yongquan Fan, Shumin Wang","doi":"10.1007/s13042-024-02322-0","DOIUrl":"https://doi.org/10.1007/s13042-024-02322-0","url":null,"abstract":"<p>Fake news detection aims to automatically identify the credibility of source posts, mitigating potential societal harm and conserving human resources. Textual fake news detection methods can be categorized into pattern- and fact-based. Pattern-based models focus on identifying shared writing patterns in source posts, while fact-based models leverage auxiliary external knowledge. Researchers have recently attempted to merge these two views into a comprehensive detection system, achieving superior performance to single-view methods. However, existing dual-view methods often prioritize integrating single-view methods over exploring nuanced characteristics of both perspectives. To address this, we propose a novel hierarchical dual-view model for fake news detection guided by discriminative lexicons. First, we construct two lexicons based on distinct word usage tendencies in fake and real news and further augment them with synonyms sourced from large language models. We then devise a hierarchical attention network to derive semantic representations for the source post, incorporating a lexicon attention loss to guide the prioritization of words from the two lexicons. Subsequently, a lexicon-guided interaction network is employed to model the relations between the source post and its relevant articles, assigning authenticity-aware weights to each article. Finally, the representations of source post and relevant articles are concatenated for joint detection. According to experimental results, our model outperforms many competitive baselines in terms of the macro F1 score ranging from 1.1% to 10.5% on Weibo and from 3.2% to 10.8% on Twitter.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"35 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agriculture food (agri-food) safety is closely related to all aspects of people's lives. In recent years, with the emergence of deep learning technology based on big data, the extraction of information relations in the field of agri-food safety supervision has become a research hotspot. However, most of the current work only expands the relationship recognition based on the traditional named entity recognition task, which makes it difficult to establish a true 'connection' between entities and relationships. The pipelined and federated extraction architectures that have emerged in this area are problematic in practice. In addition, the contextual information of the text corpus in the agri-food safety regulatory domain has not been fully utilized. To address the above issues, this paper proposes a semi-joint entity relationship extraction model (EB-SJRE) based on contextual entity boundary features. Firstly, a Token pair subject-object correspondence matrix label is designed to intuitively model the subject-object boundary, which is more friendly to complex entities in the field of agri-food safety regulation. Secondly, the dynamic fine-tuning of Bert makes the text embedding more relevant to the textual context of the agri-food safety regulation domain. Finally, we introduce an attention mechanism in the Token pair tagging framework to capture deep semantic subject-object boundary association information, which cleverly solves the problem of bias exposure due to the pipeline structure and the dimensional explosion due to the joint extraction structure. The experimental results show that our model achieves the best F1-score of 88.71% on agri-food safety regulation domain data and F1-scores of 92.36%, 92.80%, 88.91%, and 92.21% on NYT, NYT-star, WebNLG, and WebNLG-star, respectively. This indicates that EB-SJRE has excellent generalization ability in both the agri-food safety regulatory and public sectors.
{"title":"Extraction of entity relationships serving the field of agriculture food safety regulation","authors":"Zhihua Zhao, Yiming Liu, Dongdong Lv, Ruixuan Li, Xudong Yu, Dianhui Mao","doi":"10.1007/s13042-024-02304-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02304-2","url":null,"abstract":"<p>Agriculture food (agri-food) safety is closely related to all aspects of people's lives. In recent years, with the emergence of deep learning technology based on big data, the extraction of information relations in the field of agri-food safety supervision has become a research hotspot. However, most of the current work only expands the relationship recognition based on the traditional named entity recognition task, which makes it difficult to establish a true 'connection' between entities and relationships. The pipelined and federated extraction architectures that have emerged in this area are problematic in practice. In addition, the contextual information of the text corpus in the agri-food safety regulatory domain has not been fully utilized. To address the above issues, this paper proposes a semi-joint entity relationship extraction model (EB-SJRE) based on contextual entity boundary features. Firstly, a Token pair subject-object correspondence matrix label is designed to intuitively model the subject-object boundary, which is more friendly to complex entities in the field of agri-food safety regulation. Secondly, the dynamic fine-tuning of Bert makes the text embedding more relevant to the textual context of the agri-food safety regulation domain. Finally, we introduce an attention mechanism in the Token pair tagging framework to capture deep semantic subject-object boundary association information, which cleverly solves the problem of bias exposure due to the pipeline structure and the dimensional explosion due to the joint extraction structure. The experimental results show that our model achieves the best F1-score of 88.71% on agri-food safety regulation domain data and F1-scores of 92.36%, 92.80%, 88.91%, and 92.21% on NYT, NYT-star, WebNLG, and WebNLG-star, respectively. This indicates that EB-SJRE has excellent generalization ability in both the agri-food safety regulatory and public sectors.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"28 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-21DOI: 10.1007/s13042-024-02298-x
Yonghao Chen, Xiaozhao Fang, Yuanyuan Liu, Xi Hu, Na Han, Peipei Kang
Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.
传统的图像检索方法在目标数据集上训练模型后,再在另一个数据集上运行时,性能会明显下降。为解决这一问题,领域自适应检索(DAR)成为一种很有前途的解决方案,专门用于克服检索任务中的领域偏移。然而,现有的无监督 DAR 方法仍然面临两个主要限制:(1) 它们对领域之间的内在结构探索不足,导致泛化能力有限;(2) 模型通常过于复杂,无法应用于大规模数据集。为了解决这些局限性,我们提出了一种新型无监督 DAR 方法,名为基于锚点的域自适应散列(ADAH)。ADAH 的目的是利用域之间的共性,并假设源域和目标域存在一致的潜在空间。为此,提出了一种基于锚的相似性重构方案,该方案学习一组域共享锚和特定域的锚图,然后用这些锚图重构相似性矩阵,从而有效地利用域间和域内的相似性结构。随后,通过将锚图视为特征嵌入,我们解决了锚图与相应哈希代码之间的距离差最小化(DDDM)问题。这就保留了哈希代码中相似性矩阵的相似性结构。最后,我们采用两阶段策略推导哈希函数,确保其有效性和可扩展性。在四个数据集上的实验结果证明了所提方法的有效性。
{"title":"Anchor-based Domain Adaptive Hashing for unsupervised image retrieval","authors":"Yonghao Chen, Xiaozhao Fang, Yuanyuan Liu, Xi Hu, Na Han, Peipei Kang","doi":"10.1007/s13042-024-02298-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02298-x","url":null,"abstract":"<p>Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"32 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s13042-024-02321-1
Jiafeng Li, Shengyao Sun, Kang Zhang, Jing Zhang, Li Zhuo
The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model’s ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model’s ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.
{"title":"Single-stage zero-shot object detection network based on CLIP and pseudo-labeling","authors":"Jiafeng Li, Shengyao Sun, Kang Zhang, Jing Zhang, Li Zhuo","doi":"10.1007/s13042-024-02321-1","DOIUrl":"https://doi.org/10.1007/s13042-024-02321-1","url":null,"abstract":"<p>The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model’s ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model’s ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"41 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142208923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s13042-024-02315-z
Lakshma Reddy Vuyyuru, NagaMalleswara Rao Purimetla, Kancharakunt Yakub Reddy, Sai Srinivas Vellela, Sk Khader Basha, Ramesh Vatambeti
This study presents a pioneering solution to the growing challenge of escalating global crime rates through the introduction of an automated drone-based street crime detection system. Leveraging advanced Convolutional Neural Network (CNN) models, the system integrates several key components for analyzing images captured by drones. Initially, the Embedding Bilateral Filter (EBF) technique divides images into base and detail layers to enhance detection accuracy. The fusion model, IR with attention-based Conv-ViT, combines Inception-V3, ResNet-50, and Convolution Vision Transformer (Conv-ViT) to capture both shape and texture details efficiently. Further enhancement is achieved through the Improved Shark Smell Optimization Algorithm (ISSOA), which optimizes feature selection and minimizes redundancy in image extraction. Additionally, a Multi-scale Contextual Semantic Guidance Network (MCS-GNet) ensures robust image classification by integrating features from multiple layers to prevent data loss. Evaluation on the UCF-Crime and UCSD Ped2 datasets demonstrates superior accuracy, with remarkable results of 0.783 and 0.974, respectively. This innovative approach offers a promising solution to the arduous and continuous task of monitoring security camera feeds for suspicious activities, thereby addressing the pressing need for automated crime detection systems on a global scale.
{"title":"Advancing automated street crime detection: a drone-based system integrating CNN models and enhanced feature selection techniques","authors":"Lakshma Reddy Vuyyuru, NagaMalleswara Rao Purimetla, Kancharakunt Yakub Reddy, Sai Srinivas Vellela, Sk Khader Basha, Ramesh Vatambeti","doi":"10.1007/s13042-024-02315-z","DOIUrl":"https://doi.org/10.1007/s13042-024-02315-z","url":null,"abstract":"<p>This study presents a pioneering solution to the growing challenge of escalating global crime rates through the introduction of an automated drone-based street crime detection system. Leveraging advanced Convolutional Neural Network (CNN) models, the system integrates several key components for analyzing images captured by drones. Initially, the Embedding Bilateral Filter (EBF) technique divides images into base and detail layers to enhance detection accuracy. The fusion model, IR with attention-based Conv-ViT, combines Inception-V3, ResNet-50, and Convolution Vision Transformer (Conv-ViT) to capture both shape and texture details efficiently. Further enhancement is achieved through the Improved Shark Smell Optimization Algorithm (ISSOA), which optimizes feature selection and minimizes redundancy in image extraction. Additionally, a Multi-scale Contextual Semantic Guidance Network (MCS-GNet) ensures robust image classification by integrating features from multiple layers to prevent data loss. Evaluation on the UCF-Crime and UCSD Ped2 datasets demonstrates superior accuracy, with remarkable results of 0.783 and 0.974, respectively. This innovative approach offers a promising solution to the arduous and continuous task of monitoring security camera feeds for suspicious activities, thereby addressing the pressing need for automated crime detection systems on a global scale.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"19 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Document-level relation extraction aims at discerning semantic connections between entities within a given document. Compared with sentence-level relation extraction settings, the complexity of document-level relation extraction lies in necessitating models to exhibit the capability to infer semantic relations across multiple sentences. In this paper, we propose a novel model, named Entity Type-Constrained Graph Network (ETCGN). The proposed model utilizes a graph structure to capture intricate interactions among diverse mentions within the document. Moreover, it aggregates references to the same entity while integrating path-based reasoning mechanisms to deduce relations between entities. Furthermore, we present a novel constraint method that capitalizes on entity types to confine the scope of potential relations. Experimental results on two public dataset (DocRED and HacRED) show that our model outperforms a number of baselines and achieves state-of-the-art performance. Further analysis verifies the effectiveness of type-based constraints and path-based reasoning mechanisms. Our code is available at: https://github.com/yhx30/ETCGN.
{"title":"ETCGN: entity type-constrained graph networks for document-level relation extraction","authors":"Hangxiao Yang, Changpu Chen, Shaokai Zhang, Baiyang Chen, Chang Liu, Qilin Li","doi":"10.1007/s13042-024-02293-2","DOIUrl":"https://doi.org/10.1007/s13042-024-02293-2","url":null,"abstract":"<p>Document-level relation extraction aims at discerning semantic connections between entities within a given document. Compared with sentence-level relation extraction settings, the complexity of document-level relation extraction lies in necessitating models to exhibit the capability to infer semantic relations across multiple sentences. In this paper, we propose a novel model, named Entity Type-Constrained Graph Network (ETCGN). The proposed model utilizes a graph structure to capture intricate interactions among diverse mentions within the document. Moreover, it aggregates references to the same entity while integrating path-based reasoning mechanisms to deduce relations between entities. Furthermore, we present a novel constraint method that capitalizes on entity types to confine the scope of potential relations. Experimental results on two public dataset (DocRED and HacRED) show that our model outperforms a number of baselines and achieves state-of-the-art performance. Further analysis verifies the effectiveness of type-based constraints and path-based reasoning mechanisms. Our code is available at: https://github.com/yhx30/ETCGN.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"58 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s13042-024-02309-x
Yuxia Cao
Cross-modal hashing retrieval maps data from different modalities into a common low-dimensional hash code space, enabling fast and efficient retrieval. Recently, there has been a growing interest in the cross-modal hashing retrieval approach. Nonetheless, a significant number of current methodologies overlook the influence of semantically rich features on retrieval performance. In addition, class attribute embedding is often forgotten in cross-modal feature fusion, which is crucial for learning more discriminative hash codes. To meet these challenges, we put forward a novel method, namely joint feature fusion hashing (JFFH) for cross-modal retrieval. Specifically, we use the fast language image pre-training model as the feature coding module of cross-modal data. To more effectively mitigate semantic disparities between modalities, we introduce a multimodal contrastive learning loss to strengthen the interaction between modalities and improve the semantic representation of modalities. In addition, we extract class attribute features as class embedding and integrate them with cross-modal features to enhance the semantic relationship within the fused features. To better capture both inter-modal and intra-modal dependencies as well as semantic relevance, we integrate the self-attention mechanism into the multi-modal fusion transformer encoder to facilitate efficient feature fusion. Besides, we apply label-wise high-level semantic similarity and feature-wise low-level semantic similarity to enhance the discrimination of hash codes. Our JFFH method shows better retrieval performance in large-scale cross-modal retrieval.
{"title":"Joint feature fusion hashing for cross-modal retrieval","authors":"Yuxia Cao","doi":"10.1007/s13042-024-02309-x","DOIUrl":"https://doi.org/10.1007/s13042-024-02309-x","url":null,"abstract":"<p>Cross-modal hashing retrieval maps data from different modalities into a common low-dimensional hash code space, enabling fast and efficient retrieval. Recently, there has been a growing interest in the cross-modal hashing retrieval approach. Nonetheless, a significant number of current methodologies overlook the influence of semantically rich features on retrieval performance. In addition, class attribute embedding is often forgotten in cross-modal feature fusion, which is crucial for learning more discriminative hash codes. To meet these challenges, we put forward a novel method, namely joint feature fusion hashing (JFFH) for cross-modal retrieval. Specifically, we use the fast language image pre-training model as the feature coding module of cross-modal data. To more effectively mitigate semantic disparities between modalities, we introduce a multimodal contrastive learning loss to strengthen the interaction between modalities and improve the semantic representation of modalities. In addition, we extract class attribute features as class embedding and integrate them with cross-modal features to enhance the semantic relationship within the fused features. To better capture both inter-modal and intra-modal dependencies as well as semantic relevance, we integrate the self-attention mechanism into the multi-modal fusion transformer encoder to facilitate efficient feature fusion. Besides, we apply label-wise high-level semantic similarity and feature-wise low-level semantic similarity to enhance the discrimination of hash codes. Our JFFH method shows better retrieval performance in large-scale cross-modal retrieval.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"7 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}