Lijia Deng, Qinghua Zhou, Shuihua Wang, Juan Manuel Górriz, Yudong Zhang
Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.
{"title":"Deep learning in crowd counting: A survey","authors":"Lijia Deng, Qinghua Zhou, Shuihua Wang, Juan Manuel Górriz, Yudong Zhang","doi":"10.1049/cit2.12241","DOIUrl":"10.1049/cit2.12241","url":null,"abstract":"<p>Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1043-1077"},"PeriodicalIF":8.4,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80112114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Three-way concept analysis is an important tool for information processing, and rule acquisition is one of the research hotspots of three-way concept analysis. However, compared with three-way concept lattices, three-way semi-concept lattices have three-way operators with weaker constraints, which can generate more concepts. In this article, the problem of rule acquisition for three-way semi-concept lattices is discussed in general. The authors construct the finer relation of three-way semi-concept lattices, and propose a method of rule acquisition for three-way semi-concept lattices. The authors also discuss the set of decision rules and the relationships of decision rules among object-induced three-way semi-concept lattices, object-induced three-way concept lattices, classical concept lattices and semi-concept lattices. Finally, examples are provided to illustrate the validity of our conclusions.
{"title":"Rule acquisition of three-way semi-concept lattices in formal decision context","authors":"Jie Zhao, Renxia Wan, Duoqian Miao, Boyang Zhang","doi":"10.1049/cit2.12248","DOIUrl":"10.1049/cit2.12248","url":null,"abstract":"<p>Three-way concept analysis is an important tool for information processing, and rule acquisition is one of the research hotspots of three-way concept analysis. However, compared with three-way concept lattices, three-way semi-concept lattices have three-way operators with weaker constraints, which can generate more concepts. In this article, the problem of rule acquisition for three-way semi-concept lattices is discussed in general. The authors construct the finer relation of three-way semi-concept lattices, and propose a method of rule acquisition for three-way semi-concept lattices. The authors also discuss the set of decision rules and the relationships of decision rules among object-induced three-way semi-concept lattices, object-induced three-way concept lattices, classical concept lattices and semi-concept lattices. Finally, examples are provided to illustrate the validity of our conclusions.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"333-347"},"PeriodicalIF":5.1,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12248","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75371687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhao Xue, Yang Yang, Lei Li, Zhongling Huang, Xinggang Wang, Junwei Han, Dingwen Zhang
Segmenting the semantic regions of point clouds is a crucial step for intelligent agents to understand 3D scenes. Weakly supervised point cloud segmentation is highly desirable because entirely labelling point clouds is highly time-consuming and costly. For the low-costing labelling of 3D point clouds, the scene-level label is one of the most effortless label strategies. However, due to the limitation of classifier discriminative capability and the orderless and structurless nature of the point cloud data, existing scene-level method is hard to transfer the semantic information, which usually leads to the under-activated or over-activated issues. To this end, a local semantic embedding network is introduced to learn local structural patterns and semantic propagation. Specifically, the proposed network contains graph convolution-based dilation and erosion embedding modules to implement ‘inside-out’ and ‘outside-in’ semantic information dissemination pathways. Therefore, the proposed weakly supervised learning framework could achieve the mutual propagation of semantic information in the foreground and background. Comprehensive experiments on the widely used ScanNet benchmark demonstrate the superior capacity of the proposed approach when compared to the current alternatives and baseline models.
{"title":"Weakly supervised point cloud segmentation via deep morphological semantic information embedding","authors":"Wenhao Xue, Yang Yang, Lei Li, Zhongling Huang, Xinggang Wang, Junwei Han, Dingwen Zhang","doi":"10.1049/cit2.12239","DOIUrl":"10.1049/cit2.12239","url":null,"abstract":"<p>Segmenting the semantic regions of point clouds is a crucial step for intelligent agents to understand 3D scenes. Weakly supervised point cloud segmentation is highly desirable because entirely labelling point clouds is highly time-consuming and costly. For the low-costing labelling of 3D point clouds, the scene-level label is one of the most effortless label strategies. However, due to the limitation of classifier discriminative capability and the orderless and structurless nature of the point cloud data, existing scene-level method is hard to transfer the semantic information, which usually leads to the under-activated or over-activated issues. To this end, a local semantic embedding network is introduced to learn local structural patterns and semantic propagation. Specifically, the proposed network contains graph convolution-based dilation and erosion embedding modules to implement ‘inside-out’ and ‘outside-in’ semantic information dissemination pathways. Therefore, the proposed weakly supervised learning framework could achieve the mutual propagation of semantic information in the foreground and background. Comprehensive experiments on the widely used ScanNet benchmark demonstrate the superior capacity of the proposed approach when compared to the current alternatives and baseline models.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"695-708"},"PeriodicalIF":5.1,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12239","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75912380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Nouman Noor, Muhammad Nazir, Imran Ashraf, N. Almujally, Muhammad Aslam, Syeda Fizzah Jilani
{"title":"GastroNet: A robust attention‐based deep learning and cosine similarity feature selection framework for gastrointestinal disease classification from endoscopic images","authors":"Muhammad Nouman Noor, Muhammad Nazir, Imran Ashraf, N. Almujally, Muhammad Aslam, Syeda Fizzah Jilani","doi":"10.1049/cit2.12231","DOIUrl":"https://doi.org/10.1049/cit2.12231","url":null,"abstract":"","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"17 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88023966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siwei Ma, Maoguo Gong, Guojun Qi, Yun Tie, Ivan Lee, Bo Li, Cong Jin
<p>The metaverse is a new type of Internet application and social form that integrates a variety of new technologies, including artificial intelligence, digital twins, block chain, cloud computing, virtual reality, robots, with brain-computer interfaces, and 5G. Media convergence technology is a systematic and comprehensive discipline that applies the theories and methods of modern science and technology to the development of media innovation, mainly including multimedia creation, production, communication, service, consumption, reproduction and so on. The emergence of new technologies, such as deep learning, distributed computing, and extended reality has promoted the development of media integration in the metaverse, and these technologies are the key factors that promote the current transformation of the Internet to the metaverse.</p><p>This Special Issue aims to collect research on the application of media convergence and intelligent technology in the metaverse, focussing on the theory and technology of intelligent generation of multimedia content based on deep learning, the intelligent recommendation algorithm of media content with privacy protection as the core, the prediction model of multimedia communication based on big data analysis, and the immersive experience technology (VR/AR) in metaverse and multimedia communication, 5G/6G mobile Internet ultrahigh-definition video transmission and storage resource allocation algorithm, neural network-based media content encryption algorithm. Original research and review articles are welcome.</p><p>The first article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes [<span>1</span>]. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.</p><p>The second article aims at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples, a bilateral U-Net network model with a spatial attention mechanism is designed [<span>2</span>]. The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention (APSA) module compared to the Attenuated Spatial Pyramid module, which can increase the receptive field and enhance the information, and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results, and the model effectively improves the segmentation accuracy of small data sets. The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks, the algorithm has a better segmentation effect and segmentation accur
{"title":"Guest Editorial: Special issue on media convergence and intelligent technology in the metaverse","authors":"Siwei Ma, Maoguo Gong, Guojun Qi, Yun Tie, Ivan Lee, Bo Li, Cong Jin","doi":"10.1049/cit2.12250","DOIUrl":"https://doi.org/10.1049/cit2.12250","url":null,"abstract":"<p>The metaverse is a new type of Internet application and social form that integrates a variety of new technologies, including artificial intelligence, digital twins, block chain, cloud computing, virtual reality, robots, with brain-computer interfaces, and 5G. Media convergence technology is a systematic and comprehensive discipline that applies the theories and methods of modern science and technology to the development of media innovation, mainly including multimedia creation, production, communication, service, consumption, reproduction and so on. The emergence of new technologies, such as deep learning, distributed computing, and extended reality has promoted the development of media integration in the metaverse, and these technologies are the key factors that promote the current transformation of the Internet to the metaverse.</p><p>This Special Issue aims to collect research on the application of media convergence and intelligent technology in the metaverse, focussing on the theory and technology of intelligent generation of multimedia content based on deep learning, the intelligent recommendation algorithm of media content with privacy protection as the core, the prediction model of multimedia communication based on big data analysis, and the immersive experience technology (VR/AR) in metaverse and multimedia communication, 5G/6G mobile Internet ultrahigh-definition video transmission and storage resource allocation algorithm, neural network-based media content encryption algorithm. Original research and review articles are welcome.</p><p>The first article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes [<span>1</span>]. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.</p><p>The second article aims at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples, a bilateral U-Net network model with a spatial attention mechanism is designed [<span>2</span>]. The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention (APSA) module compared to the Attenuated Spatial Pyramid module, which can increase the receptive field and enhance the information, and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results, and the model effectively improves the segmentation accuracy of small data sets. The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks, the algorithm has a better segmentation effect and segmentation accur","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"285-287"},"PeriodicalIF":5.1,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12250","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50140208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hania Tarik, Shahzad Hassan, R. A. Naqvi, Saddaf Rubab, Usman Tariq, Monia Hamdi, H. Elmannai, Ye Jin Kim, Jaehyuk Cha
{"title":"Empowering and conquering infirmity of visually impaired using AI‐technology equipped with object detection and real‐time voice feedback system in healthcare application","authors":"Hania Tarik, Shahzad Hassan, R. A. Naqvi, Saddaf Rubab, Usman Tariq, Monia Hamdi, H. Elmannai, Ye Jin Kim, Jaehyuk Cha","doi":"10.1049/cit2.12243","DOIUrl":"https://doi.org/10.1049/cit2.12243","url":null,"abstract":"","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"19 1","pages":""},"PeriodicalIF":5.1,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74737275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra-class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same-class pixels from different scenes; (ii) The inter-class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra-class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity. Finally, the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.
{"title":"Semantic segmentation via pixel-to-center similarity calculation","authors":"Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao","doi":"10.1049/cit2.12245","DOIUrl":"10.1049/cit2.12245","url":null,"abstract":"<p>Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra-class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same-class pixels from different scenes; (ii) The inter-class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra-class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity. Finally, the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 1","pages":"87-100"},"PeriodicalIF":5.1,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135449405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hemanth Gudaparthi, Nan Niu, Yilong Yang, Matthew Van Doren, Reese Johnson
Combined sewer overflows represent significant risks to human health as untreated water is discharged to the environment. Municipalities, such as the Metropolitan Sewer District of Greater Cincinnati (MSDGC), recently began collecting large amounts of water-related data and considering the adoption of deep learning (DL) solutions like recurrent neural network (RNN) for predicting overflow events. Clearly, assessing the DL's fitness for the purpose requires a systematic understanding of the problem context. In this study, we propose a requirements engineering framework that uses the problem frames to identify and structure the stakeholder concerns, analyses the physical situations in which the high-quality data assumptions may not hold, and derives the software testing criteria in the form of metamorphic relations that incorporate both input transformations and output comparisons. Applying our framework to MSDGC's overflow prediction problem enables a principled way to evaluate different RNN solutions in meeting the requirements.
{"title":"Deep learning's fitness for purpose: A transformation problem frame's perspective","authors":"Hemanth Gudaparthi, Nan Niu, Yilong Yang, Matthew Van Doren, Reese Johnson","doi":"10.1049/cit2.12237","DOIUrl":"https://doi.org/10.1049/cit2.12237","url":null,"abstract":"<p>Combined sewer overflows represent significant risks to human health as untreated water is discharged to the environment. Municipalities, such as the Metropolitan Sewer District of Greater Cincinnati (MSDGC), recently began collecting large amounts of water-related data and considering the adoption of deep learning (DL) solutions like recurrent neural network (RNN) for predicting overflow events. Clearly, assessing the DL's fitness for the purpose requires a systematic understanding of the problem context. In this study, we propose a requirements engineering framework that uses the problem frames to identify and structure the stakeholder concerns, analyses the physical situations in which the high-quality data assumptions may not hold, and derives the software testing criteria in the form of metamorphic relations that incorporate both input transformations and output comparisons. Applying our framework to MSDGC's overflow prediction problem enables a principled way to evaluate different RNN solutions in meeting the requirements.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"343-354"},"PeriodicalIF":5.1,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12237","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50137138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandra Carrasco Limeros, Sylwia Majchrowska, Joakim Johnander, Christoffer Petersson, Miguel Ángel Sotelo, David Fernández Llorca
Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. The authors aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. The focus is on evaluation criteria, robustness, and interpretability of outputs. First, the evaluation metrics are comprehensively analysed, the main gaps of current benchmarks are identified, and a new holistic evaluation framework is proposed. Then, a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system. To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework, an intent prediction layer that can be attached to multi-modal motion prediction models is proposed. The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions. The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autonomous vehicles, advancing the field towards greater safety and reliability.
{"title":"Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs","authors":"Sandra Carrasco Limeros, Sylwia Majchrowska, Joakim Johnander, Christoffer Petersson, Miguel Ángel Sotelo, David Fernández Llorca","doi":"10.1049/cit2.12244","DOIUrl":"10.1049/cit2.12244","url":null,"abstract":"<p>Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. The authors aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. The focus is on evaluation criteria, robustness, and interpretability of outputs. First, the evaluation metrics are comprehensively analysed, the main gaps of current benchmarks are identified, and a new holistic evaluation framework is proposed. Then, a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system. To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework, an intent prediction layer that can be attached to multi-modal motion prediction models is proposed. The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions. The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autonomous vehicles, advancing the field towards greater safety and reliability.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"557-572"},"PeriodicalIF":5.1,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81913343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eye health has become a global health concern and attracted broad attention. Over the years, researchers have proposed many state-of-the-art convolutional neural networks (CNNs) to assist ophthalmologists in diagnosing ocular diseases efficiently and precisely. However, most existing methods were dedicated to constructing sophisticated CNNs, inevitably ignoring the trade-off between performance and model complexity. To alleviate this paradox, this paper proposes a lightweight yet efficient network architecture, mixed-decomposed convolutional network (MDNet), to recognise ocular diseases. In MDNet, we introduce a novel mixed-decomposed depthwise convolution method, which takes advantage of depthwise convolution and depthwise dilated convolution operations to capture low-resolution and high-resolution patterns by using fewer computations and fewer parameters. We conduct extensive experiments on the clinical anterior segment optical coherence tomography (AS-OCT), LAG, University of California San Diego, and CIFAR-100 datasets. The results show our MDNet achieves a better trade-off between the performance and model complexity than efficient CNNs including MobileNets and MixNets. Specifically, our MDNet outperforms MobileNets by 2.5% of accuracy by using 22% fewer parameters and 30% fewer computations on the AS-OCT dataset.
{"title":"Mixed-decomposed convolutional network: A lightweight yet efficient convolutional neural network for ocular disease recognition","authors":"Xiaoqing Zhang, Xiao Wu, Zunjie Xiao, Lingxi Hu, Zhongxi Qiu, Qingyang Sun, Risa Higashita, Jiang Liu","doi":"10.1049/cit2.12246","DOIUrl":"10.1049/cit2.12246","url":null,"abstract":"<p>Eye health has become a global health concern and attracted broad attention. Over the years, researchers have proposed many state-of-the-art convolutional neural networks (CNNs) to assist ophthalmologists in diagnosing ocular diseases efficiently and precisely. However, most existing methods were dedicated to constructing sophisticated CNNs, inevitably ignoring the trade-off between performance and model complexity. To alleviate this paradox, this paper proposes a lightweight yet efficient network architecture, mixed-decomposed convolutional network (MDNet), to recognise ocular diseases. In MDNet, we introduce a novel mixed-decomposed depthwise convolution method, which takes advantage of depthwise convolution and depthwise dilated convolution operations to capture low-resolution and high-resolution patterns by using fewer computations and fewer parameters. We conduct extensive experiments on the clinical anterior segment optical coherence tomography (AS-OCT), LAG, University of California San Diego, and CIFAR-100 datasets. The results show our MDNet achieves a better trade-off between the performance and model complexity than efficient CNNs including MobileNets and MixNets. Specifically, our MDNet outperforms MobileNets by <b>2.5%</b> of accuracy by using <b>22%</b> fewer parameters and <b>30%</b> fewer computations on the AS-OCT dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"319-332"},"PeriodicalIF":5.1,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12246","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86230090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}