Yiyu Chen, Dongyang Fu, Difeng Wang, Haoen Huang, Yang Si, Shangfeng Du
Achieving high-precision extraction of sea islands from high-resolution satellite remote sensing images is crucial for effective resource development and sustainable management. Unfortunately, achieving such accuracy for sea island extraction presents significant challenges due to the presence of extensive background interference. A more widely applicable noise-tolerant matched filter (NTMF) scheme is proposed for sea island extraction based on the MF scheme. The NTMF scheme effectively suppresses the background interference, leading to more accurate and robust sea island extraction. To further enhance the accuracy and robustness of the NTMF scheme, a neural dynamics algorithm is supplemented that adds an error integration feedback term to counter noise interference during internal computer operations in practical applications. Several comparative experiments were conducted on various remote sensing images of sea islands under different noisy working conditions to demonstrate the superiority of the proposed neural dynamics algorithm-assisted NTMF scheme. These experiments confirm the advantages of using the NTMF scheme for sea island extraction with the assistance of neural dynamics algorithm.
{"title":"Noise-tolerant matched filter scheme supplemented with neural dynamics algorithm for sea island extraction","authors":"Yiyu Chen, Dongyang Fu, Difeng Wang, Haoen Huang, Yang Si, Shangfeng Du","doi":"10.1049/cit2.12323","DOIUrl":"10.1049/cit2.12323","url":null,"abstract":"<p>Achieving high-precision extraction of sea islands from high-resolution satellite remote sensing images is crucial for effective resource development and sustainable management. Unfortunately, achieving such accuracy for sea island extraction presents significant challenges due to the presence of extensive background interference. A more widely applicable noise-tolerant matched filter (NTMF) scheme is proposed for sea island extraction based on the MF scheme. The NTMF scheme effectively suppresses the background interference, leading to more accurate and robust sea island extraction. To further enhance the accuracy and robustness of the NTMF scheme, a neural dynamics algorithm is supplemented that adds an error integration feedback term to counter noise interference during internal computer operations in practical applications. Several comparative experiments were conducted on various remote sensing images of sea islands under different noisy working conditions to demonstrate the superiority of the proposed neural dynamics algorithm-assisted NTMF scheme. These experiments confirm the advantages of using the NTMF scheme for sea island extraction with the assistance of neural dynamics algorithm.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"996-1013"},"PeriodicalIF":8.4,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12323","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140220419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional semantics. The other is fusing complementary cross-modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross-modal attention. As a result, the located nonverbal features are not only salient but also complementary to sentiment words directly. Experimental results show that the authors’ method achieves state-of-the-art performance on several multimodal affective analysis datasets.
{"title":"Conditional selection with CNN augmented transformer for multimodal affective analysis","authors":"Jianwen Wang, Shiping Wang, Shunxin Xiao, Renjie Lin, Mianxiong Dong, Wenzhong Guo","doi":"10.1049/cit2.12320","DOIUrl":"10.1049/cit2.12320","url":null,"abstract":"<p>Attention mechanism has been a successful method for multimodal affective analysis in recent years. Despite the advances, several significant challenges remain in fusing language and its nonverbal context information. One is to generate sparse attention coefficients associated with acoustic and visual modalities, which helps locate critical emotional semantics. The other is fusing complementary cross-modal representation to construct optimal salient feature combinations of multiple modalities. A Conditional Transformer Fusion Network is proposed to handle these problems. Firstly, the authors equip the transformer module with CNN layers to enhance the detection of subtle signal patterns in nonverbal sequences. Secondly, sentiment words are utilised as context conditions to guide the computation of cross-modal attention. As a result, the located nonverbal features are not only salient but also complementary to sentiment words directly. Experimental results show that the authors’ method achieves state-of-the-art performance on several multimodal affective analysis datasets.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"917-931"},"PeriodicalIF":8.4,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12320","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140217516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiali Li, Bo Liu, Zhi Wei, Zhaoqi Wang, Licheng Wu
Mahjong, a complex game with hidden information and sparse rewards, poses significant challenges. Existing Mahjong AIs require substantial hardware resources and extensive datasets to enhance AI capabilities. The authors propose a transformer-based Mahjong AI (Tjong) via hierarchical decision-making. By utilising self-attention mechanisms, Tjong effectively captures tile patterns and game dynamics, and it decouples the decision process into two distinct stages: action decision and tile decision. This design reduces decision complexity considerably. Additionally, a fan backward technique is proposed to address the sparse rewards by allocating reversed rewards for actions based on winning hands. Tjong consists of 15M parameters and is trained using approximately 0.5 M data over 7 days of supervised learning on a single server with 2 GPUs. The action decision achieved an accuracy of 94.63%, while the claim decision attained 98.55% and the discard decision reached 81.51%. In a tournament format, Tjong outperformed AIs (CNN, MLP, RNN, ResNet, VIT), achieving scores up to 230% higher than its opponents. Furthermore, after 3 days of reinforcement learning training, it ranked within the top 1% on the leaderboard on the Botzone platform.
{"title":"Tjong: A transformer-based Mahjong AI via hierarchical decision-making and fan backward","authors":"Xiali Li, Bo Liu, Zhi Wei, Zhaoqi Wang, Licheng Wu","doi":"10.1049/cit2.12298","DOIUrl":"10.1049/cit2.12298","url":null,"abstract":"<p>Mahjong, a complex game with hidden information and sparse rewards, poses significant challenges. Existing Mahjong AIs require substantial hardware resources and extensive datasets to enhance AI capabilities. The authors propose a transformer-based Mahjong AI (Tjong) via hierarchical decision-making. By utilising self-attention mechanisms, Tjong effectively captures tile patterns and game dynamics, and it decouples the decision process into two distinct stages: action decision and tile decision. This design reduces decision complexity considerably. Additionally, a fan backward technique is proposed to address the sparse rewards by allocating reversed rewards for actions based on winning hands. Tjong consists of 15M parameters and is trained using approximately 0.5 M data over 7 days of supervised learning on a single server with 2 GPUs. The action decision achieved an accuracy of 94.63%, while the claim decision attained 98.55% and the discard decision reached 81.51%. In a tournament format, Tjong outperformed AIs (CNN, MLP, RNN, ResNet, VIT), achieving scores up to 230% higher than its opponents. Furthermore, after 3 days of reinforcement learning training, it ranked within the top 1% on the leaderboard on the Botzone platform.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"982-995"},"PeriodicalIF":8.4,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140221859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Li, Dezheng Zhang, Yonghong Xie, Aziguli Wulamu, Yao Zhang
Sentiment analysis is a fine-grained analysis task that aims to identify the sentiment polarity of a specified sentence. Existing methods in Chinese sentiment analysis tasks only consider sentiment features from a single pole and scale and thus cannot fully exploit and utilise sentiment feature information, making their performance less than ideal. To resolve the problem, the authors propose a new method, GP-FMLNet, that integrates both glyph and phonetic information and design a novel feature matrix learning process for phonetic features with which to model words that have the same pinyin information but different glyph information. Our method solves the problem of misspelling words influencing sentiment polarity prediction results. Specifically, the authors iteratively mine character, glyph, and pinyin features from the input comments sentences. Then, the authors use soft attention and matrix compound modules to model the phonetic features, which empowers their model to keep on zeroing in on the dynamic-setting words in various positions and to dispense with the impacts of the deceptive-setting ones. Experiments on six public datasets prove that the proposed model fully utilises the glyph and phonetic information and improves on the performance of existing Chinese sentiment analysis algorithms.
{"title":"GP-FMLNet: A feature matrix learning network enhanced by glyph and phonetic information for Chinese sentiment analysis","authors":"Jing Li, Dezheng Zhang, Yonghong Xie, Aziguli Wulamu, Yao Zhang","doi":"10.1049/cit2.12300","DOIUrl":"10.1049/cit2.12300","url":null,"abstract":"<p>Sentiment analysis is a fine-grained analysis task that aims to identify the sentiment polarity of a specified sentence. Existing methods in Chinese sentiment analysis tasks only consider sentiment features from a single pole and scale and thus cannot fully exploit and utilise sentiment feature information, making their performance less than ideal. To resolve the problem, the authors propose a new method, GP-FMLNet, that integrates both glyph and phonetic information and design a novel feature matrix learning process for phonetic features with which to model words that have the same pinyin information but different glyph information. Our method solves the problem of misspelling words influencing sentiment polarity prediction results. Specifically, the authors iteratively mine character, glyph, and pinyin features from the input comments sentences. Then, the authors use soft attention and matrix compound modules to model the phonetic features, which empowers their model to keep on zeroing in on the dynamic-setting words in various positions and to dispense with the impacts of the deceptive-setting ones. Experiments on six public datasets prove that the proposed model fully utilises the glyph and phonetic information and improves on the performance of existing Chinese sentiment analysis algorithms.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"960-972"},"PeriodicalIF":8.4,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140230554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunkai Lyu, Xiaobing Yang, Ai Guan, Jingwen Wang, Leni Dai
It is important for construction personnel to observe the dress code, such as the correct wearing of safety helmets and reflective vests is conducive to protecting the workers' lives and safety of construction. A YOLO network-based detection algorithm is proposed for the construction personnel dress code (YOLO-CPDC). Firstly, Multi-Head Self-Attention (MHSA) is introduced into the backbone network to build a hybrid backbone, called Convolution MHSA Network (CMNet). The CMNet gives the model a global field of view and enhances the detection capability of the model for small and obscured targets. Secondly, an efficient and lightweight convolution module is designed. It is named Ghost Shuffle Attention-Conv-BN-SiLU (GSA-CBS) and is used in the neck network. The GSANeck network reduces the model size without affecting the performance. Finally, the SIoU is used in the loss function and Soft NMS is used for post-processing. Experimental results on the self-constructed dataset show that YOLO-CPDC algorithm has higher detection accuracy than current methods. YOLO-CPDC achieves a mAP50 of 93.6%. Compared with the YOLOv5s, the number of parameters of our model is reduced by 18% and the mAP50 is improved by 1.1%. Overall, this research effectively meets the actual demand of dress code detection in construction scenes.
{"title":"Construction personnel dress code detection based on YOLO framework","authors":"Yunkai Lyu, Xiaobing Yang, Ai Guan, Jingwen Wang, Leni Dai","doi":"10.1049/cit2.12312","DOIUrl":"10.1049/cit2.12312","url":null,"abstract":"<p>It is important for construction personnel to observe the dress code, such as the correct wearing of safety helmets and reflective vests is conducive to protecting the workers' lives and safety of construction. A YOLO network-based detection algorithm is proposed for the construction personnel dress code (YOLO-CPDC). Firstly, Multi-Head Self-Attention (MHSA) is introduced into the backbone network to build a hybrid backbone, called Convolution MHSA Network (CMNet). The CMNet gives the model a global field of view and enhances the detection capability of the model for small and obscured targets. Secondly, an efficient and lightweight convolution module is designed. It is named Ghost Shuffle Attention-Conv-BN-SiLU (GSA-CBS) and is used in the neck network. The GSANeck network reduces the model size without affecting the performance. Finally, the SIoU is used in the loss function and Soft NMS is used for post-processing. Experimental results on the self-constructed dataset show that YOLO-CPDC algorithm has higher detection accuracy than current methods. YOLO-CPDC achieves a mAP50 of 93.6%. Compared with the YOLOv5s, the number of parameters of our model is reduced by 18% and the mAP50 is improved by 1.1%. Overall, this research effectively meets the actual demand of dress code detection in construction scenes.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"709-721"},"PeriodicalIF":5.1,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12312","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140233303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Offline reinforcement learning (RL) aims to learn policies entirely from passively collected datasets, making it a data-driven decision method. One of the main challenges in offline RL is the distribution shift problem, which causes the algorithm to visit out-of-distribution (OOD) samples. The distribution shift can be mitigated by constraining the divergence between the target policy and the behaviour policy. However, this method can overly constrain the target policy and impair the algorithm's performance, as it does not directly distinguish between in-distribution and OOD samples. In addition, it is difficult to learn and represent multi-modal behaviour policy when the datasets are collected by several different behaviour policies. To overcome these drawbacks, the authors address the distribution shift problem by implicit policy constraints with energy-based models (EBMs) rather than explicitly modelling the behaviour policy. The EBM is powerful for representing complex multi-modal distributions as well as the ability to distinguish in-distribution samples and OODs. Experimental results show that their method significantly outperforms the explicit policy constraint method and other baselines. In addition, the learnt energy model can be used to indicate OOD visits and alert the possible failure.
{"title":"Implicit policy constraint for offline reinforcement learning","authors":"Zhiyong Peng, Yadong Liu, Changlin Han, Zongtan Zhou","doi":"10.1049/cit2.12304","DOIUrl":"10.1049/cit2.12304","url":null,"abstract":"<p>Offline reinforcement learning (RL) aims to learn policies entirely from passively collected datasets, making it a data-driven decision method. One of the main challenges in offline RL is the distribution shift problem, which causes the algorithm to visit out-of-distribution (OOD) samples. The distribution shift can be mitigated by constraining the divergence between the target policy and the behaviour policy. However, this method can overly constrain the target policy and impair the algorithm's performance, as it does not directly distinguish between in-distribution and OOD samples. In addition, it is difficult to learn and represent multi-modal behaviour policy when the datasets are collected by several different behaviour policies. To overcome these drawbacks, the authors address the distribution shift problem by implicit policy constraints with energy-based models (EBMs) rather than explicitly modelling the behaviour policy. The EBM is powerful for representing complex multi-modal distributions as well as the ability to distinguish in-distribution samples and OODs. Experimental results show that their method significantly outperforms the explicit policy constraint method and other baselines. In addition, the learnt energy model can be used to indicate OOD visits and alert the possible failure.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"973-981"},"PeriodicalIF":8.4,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12304","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140237574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hengame Ahmadi Golilarz, Alireza Azadbar, Roohallah Alizadehsani, Juan Manuel Gorriz
Myocarditis is a significant public health concern because of its potential to cause heart failure and sudden death. The standard invasive diagnostic method, endomyocardial biopsy, is typically reserved for cases with severe complications, limiting its widespread use. Conversely, non-invasive cardiac magnetic resonance (CMR) imaging presents a promising alternative for detecting and monitoring myocarditis, because of its high signal contrast that reveals myocardial involvement. To assist medical professionals via artificial intelligence, the authors introduce generative adversarial networks - multi discriminator (GAN-MD), a deep learning model that uses binary classification to diagnose myocarditis from CMR images. Their approach employs a series of convolutional neural networks (CNNs) that extract and combine feature vectors for accurate diagnosis. The authors suggest a novel technique for improving the classification precision of CNNs. Using generative adversarial networks (GANs) to create synthetic images for data augmentation, the authors address challenges such as mode collapse and unstable training. Incorporating a reconstruction loss into the GAN loss function requires the generator to produce images reflecting the discriminator features, thus enhancing the generated images' quality to more accurately replicate authentic data patterns. Moreover, combining this loss function with other regularisation methods, such as gradient penalty, has proven to further improve the performance of diverse GAN models. A significant challenge in myocarditis diagnosis is the imbalance of classification, where one class dominates over the other. To mitigate this, the authors introduce a focal loss-based training method that effectively trains the model on the minority class samples. The GAN-MD approach, evaluated on the Z-Alizadeh Sani myocarditis dataset, achieves superior results (F-measure 86.2%; geometric mean 91.0%) compared with other deep learning models and traditional machine learning methods.
心肌炎可能导致心力衰竭和猝死,因此是一个重大的公共卫生问题。标准的侵入性诊断方法--心内膜心肌活检通常只用于有严重并发症的病例,因此限制了其广泛应用。相反,无创心脏磁共振成像(CMR)因其高信号对比度可显示心肌受累情况,为检测和监测心肌炎提供了一种很有前途的替代方法。为了通过人工智能帮助医疗专业人员,作者引入了生成对抗网络--多判别器(GAN-MD),这是一种深度学习模型,使用二元分类法从 CMR 图像中诊断心肌炎。他们的方法采用了一系列卷积神经网络(CNN),通过提取和组合特征向量来进行准确诊断。作者提出了一种提高 CNN 分类精度的新技术。作者利用生成对抗网络(GANs)创建合成图像用于数据增强,从而解决了模式崩溃和训练不稳定等难题。在 GAN 损失函数中加入重建损失,要求生成器生成反映判别特征的图像,从而提高生成图像的质量,更准确地复制真实数据模式。此外,事实证明,将该损失函数与梯度惩罚等其他正则化方法相结合,可进一步提高各种 GAN 模型的性能。心肌炎诊断中的一个重大挑战是分类的不平衡,即一个类别主导另一个类别。为了缓解这一问题,作者引入了一种基于焦点损失的训练方法,该方法能有效地在少数类别样本上训练模型。GAN-MD 方法在 Z-Alizadeh Sani 心肌炎数据集上进行了评估,与其他深度学习模型和传统机器学习方法相比,取得了优异的成绩(F-measure 86.2%;geometric mean 91.0%)。
{"title":"GAN-MD: A myocarditis detection using multi-channel convolutional neural networks and generative adversarial network-based data augmentation","authors":"Hengame Ahmadi Golilarz, Alireza Azadbar, Roohallah Alizadehsani, Juan Manuel Gorriz","doi":"10.1049/cit2.12307","DOIUrl":"10.1049/cit2.12307","url":null,"abstract":"<p>Myocarditis is a significant public health concern because of its potential to cause heart failure and sudden death. The standard invasive diagnostic method, endomyocardial biopsy, is typically reserved for cases with severe complications, limiting its widespread use. Conversely, non-invasive cardiac magnetic resonance (CMR) imaging presents a promising alternative for detecting and monitoring myocarditis, because of its high signal contrast that reveals myocardial involvement. To assist medical professionals via artificial intelligence, the authors introduce generative adversarial networks - multi discriminator (GAN-MD), a deep learning model that uses binary classification to diagnose myocarditis from CMR images. Their approach employs a series of convolutional neural networks (CNNs) that extract and combine feature vectors for accurate diagnosis. The authors suggest a novel technique for improving the classification precision of CNNs. Using generative adversarial networks (GANs) to create synthetic images for data augmentation, the authors address challenges such as mode collapse and unstable training. Incorporating a reconstruction loss into the GAN loss function requires the generator to produce images reflecting the discriminator features, thus enhancing the generated images' quality to more accurately replicate authentic data patterns. Moreover, combining this loss function with other regularisation methods, such as gradient penalty, has proven to further improve the performance of diverse GAN models. A significant challenge in myocarditis diagnosis is the imbalance of classification, where one class dominates over the other. To mitigate this, the authors introduce a focal loss-based training method that effectively trains the model on the minority class samples. The GAN-MD approach, evaluated on the Z-Alizadeh Sani myocarditis dataset, achieves superior results (F-measure 86.2%; geometric mean 91.0%) compared with other deep learning models and traditional machine learning methods.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"866-878"},"PeriodicalIF":8.4,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12307","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140245001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tajinder Kumar, Purushottam Sharma, Jaswinder Tanwar, Hisham Alsghier, Shashi Bhushan, Hesham Alhumyani, Vivek Sharma, Ahmed I. Alutaibi
Cloud computing has drastically changed the delivery and consumption of live streaming content. The designs, challenges, and possible uses of cloud computing for live streaming are studied. A comprehensive overview of the technical and business issues surrounding cloud-based live streaming is provided, including the benefits of cloud computing, the various live streaming architectures, and the challenges that live streaming service providers face in delivering high-quality, real-time services. The different techniques used to improve the performance of video streaming, such as adaptive bit-rate streaming, multicast distribution, and edge computing are discussed and the necessity of low-latency and high-quality video transmission in cloud-based live streaming is underlined. Issues such as improving user experience and live streaming service performance using cutting-edge technology, like artificial intelligence and machine learning are discussed. In addition, the legal and regulatory implications of cloud-based live streaming, including issues with network neutrality, data privacy, and content moderation are addressed. The future of cloud computing for live streaming is examined in the section that follows, and it looks at the most likely new developments in terms of trends and technology. For technology vendors, live streaming service providers, and regulators, the findings have major policy-relevant implications. Suggestions on how stakeholders should address these concerns and take advantage of the potential presented by this rapidly evolving sector, as well as insights into the key challenges and opportunities associated with cloud-based live streaming are provided.
{"title":"Cloud-based video streaming services: Trends, challenges, and opportunities","authors":"Tajinder Kumar, Purushottam Sharma, Jaswinder Tanwar, Hisham Alsghier, Shashi Bhushan, Hesham Alhumyani, Vivek Sharma, Ahmed I. Alutaibi","doi":"10.1049/cit2.12299","DOIUrl":"10.1049/cit2.12299","url":null,"abstract":"<p>Cloud computing has drastically changed the delivery and consumption of live streaming content. The designs, challenges, and possible uses of cloud computing for live streaming are studied. A comprehensive overview of the technical and business issues surrounding cloud-based live streaming is provided, including the benefits of cloud computing, the various live streaming architectures, and the challenges that live streaming service providers face in delivering high-quality, real-time services. The different techniques used to improve the performance of video streaming, such as adaptive bit-rate streaming, multicast distribution, and edge computing are discussed and the necessity of low-latency and high-quality video transmission in cloud-based live streaming is underlined. Issues such as improving user experience and live streaming service performance using cutting-edge technology, like artificial intelligence and machine learning are discussed. In addition, the legal and regulatory implications of cloud-based live streaming, including issues with network neutrality, data privacy, and content moderation are addressed. The future of cloud computing for live streaming is examined in the section that follows, and it looks at the most likely new developments in terms of trends and technology. For technology vendors, live streaming service providers, and regulators, the findings have major policy-relevant implications. Suggestions on how stakeholders should address these concerns and take advantage of the potential presented by this rapidly evolving sector, as well as insights into the key challenges and opportunities associated with cloud-based live streaming are provided.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"265-285"},"PeriodicalIF":5.1,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140242982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Fang, Bailian Xie, Uswah Khairuddin, Zijian Min, Bingbing Jiang, Weisheng Li
Transformer tracking always takes paired template and search images as encoder input and conduct feature extraction and target-search feature correlation by self and/or cross attention operations, thus the model complexity will grow quadratically with the number of input images. To alleviate the burden of this tracking paradigm and facilitate practical deployment of Transformer-based trackers, we propose a dual pooling transformer tracking framework, dubbed as DPT, which consists of three components: a simple yet efficient spatiotemporal attention model (SAM), a mutual correlation pooling Transformer (MCPT) and a multiscale aggregation pooling Transformer (MAPT). SAM is designed to gracefully aggregates temporal dynamics and spatial appearance information of multi-frame templates along space-time dimensions. MCPT aims to capture multi-scale pooled and correlated contextual features, which is followed by MAPT that aggregates multi-scale features into a unified feature representation for tracking prediction. DPT tracker achieves AUC score of 69.5 on LaSOT and precision score of 82.8 on TrackingNet while maintaining a shorter sequence length of attention tokens, fewer parameters and FLOPs compared to existing state-of-the-art (SOTA) Transformer tracking methods. Extensive experiments demonstrate that DPT tracker yields a strong real-time tracking baseline with a good trade-off between tracking performance and inference efficiency.
{"title":"DPT-tracker: Dual pooling transformer for efficient visual tracking","authors":"Yang Fang, Bailian Xie, Uswah Khairuddin, Zijian Min, Bingbing Jiang, Weisheng Li","doi":"10.1049/cit2.12296","DOIUrl":"10.1049/cit2.12296","url":null,"abstract":"<p>Transformer tracking always takes paired template and search images as encoder input and conduct feature extraction and target-search feature correlation by self and/or cross attention operations, thus the model complexity will grow quadratically with the number of input images. To alleviate the burden of this tracking paradigm and facilitate practical deployment of Transformer-based trackers, we propose a dual pooling transformer tracking framework, dubbed as DPT, which consists of three components: a simple yet efficient spatiotemporal attention model (SAM), a mutual correlation pooling Transformer (MCPT) and a multiscale aggregation pooling Transformer (MAPT). SAM is designed to gracefully aggregates temporal dynamics and spatial appearance information of multi-frame templates along space-time dimensions. MCPT aims to capture multi-scale pooled and correlated contextual features, which is followed by MAPT that aggregates multi-scale features into a unified feature representation for tracking prediction. DPT tracker achieves AUC score of 69.5 on LaSOT and precision score of 82.8 on TrackingNet while maintaining a shorter sequence length of attention tokens, fewer parameters and FLOPs compared to existing state-of-the-art (SOTA) Transformer tracking methods. Extensive experiments demonstrate that DPT tracker yields a strong real-time tracking baseline with a good trade-off between tracking performance and inference efficiency.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"948-959"},"PeriodicalIF":8.4,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12296","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140244948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Ying, Duoqian Miao, Zhifei Zhang, Hongyun Zhang, Witold Pedrycz
Due to the characteristics of high resolution and rich texture information, visible light images are widely used for maritime ship detection. However, these images are susceptible to sea fog and ships of different sizes, which can result in missed detections and false alarms, ultimately resulting in lower detection accuracy. To address these issues, a novel multi-granularity feature enhancement network, MFENet, which includes a three-way dehazing module (3WDM) and a multi-granularity feature enhancement module (MFEM) is proposed. The 3WDM eliminates sea fog interference by using an image clarity automatic classification algorithm based on three-way decisions and FFA-Net to obtain clear image samples. Additionally, the MFEM improves the accuracy of detecting ships of different sizes by utilising an improved super-resolution reconstruction convolutional neural network to enhance the resolution and semantic representation capability of the feature maps from YOLOv7. Experimental results demonstrate that MFENet surpasses the other 15 competing models in terms of the mean Average Precision metric on two benchmark datasets, achieving 96.28% on the McShips dataset and 97.71% on the SeaShips dataset.
{"title":"Multi-granularity feature enhancement network for maritime ship detection","authors":"Li Ying, Duoqian Miao, Zhifei Zhang, Hongyun Zhang, Witold Pedrycz","doi":"10.1049/cit2.12310","DOIUrl":"10.1049/cit2.12310","url":null,"abstract":"<p>Due to the characteristics of high resolution and rich texture information, visible light images are widely used for maritime ship detection. However, these images are susceptible to sea fog and ships of different sizes, which can result in missed detections and false alarms, ultimately resulting in lower detection accuracy. To address these issues, a novel multi-granularity feature enhancement network, MFENet, which includes a three-way dehazing module (3WDM) and a multi-granularity feature enhancement module (MFEM) is proposed. The 3WDM eliminates sea fog interference by using an image clarity automatic classification algorithm based on three-way decisions and FFA-Net to obtain clear image samples. Additionally, the MFEM improves the accuracy of detecting ships of different sizes by utilising an improved super-resolution reconstruction convolutional neural network to enhance the resolution and semantic representation capability of the feature maps from YOLOv7. Experimental results demonstrate that MFENet surpasses the other 15 competing models in terms of the mean Average Precision metric on two benchmark datasets, achieving 96.28% on the McShips dataset and 97.71% on the SeaShips dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"649-664"},"PeriodicalIF":5.1,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12310","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140249217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}