Illegal construction has caused serious harm around the world. However, current methods are difficult to detect illegal construction activities in time, and the calculation complexity and the parameters of them are large. To solve these challenges, a new and unique detection method is proposed, which detects objects related to illegal buildings in time to discover illegal construction activities. Meanwhile, a new dataset and a high-precision and lightweight detector are proposed. The proposed detector is based on the algorithm You Only Look Once (YOLOv4). The use of DenseNet as the backbone of YDHNet enables better feature transfer and reuse, improves detection accuracy, and reduces computational costs. Meanwhile, depthwise separable convolution is employed to lightweight the neck and head to further reduce computational costs. Furthermore, H-swish is utilized to enhance non-linear feature extraction and improve detection accuracy. Experimental results illustrate that YDHNet realizes a mean average precision of 89.60% on the proposed dataset, which is 3.78% higher than YOLOv4. The computational cost and parameter count of YDHNet are 26.22 GFLOPs and 16.18 MB, respectively. Compared to YOLOv4 and other detectors, YDHNet not only has lower computational costs and higher detection accuracy, but also timely identifies illegal construction objects and automatically detects illegal construction activities.
{"title":"A New High-Precision and Lightweight Detection Model for Illegal Construction Objects Based on Deep Learning","authors":"Wenjin Liu;Lijuan Zhou;Shudong Zhang;Ning Luo;Min Xu","doi":"10.26599/TST.2023.9010090","DOIUrl":"https://doi.org/10.26599/TST.2023.9010090","url":null,"abstract":"Illegal construction has caused serious harm around the world. However, current methods are difficult to detect illegal construction activities in time, and the calculation complexity and the parameters of them are large. To solve these challenges, a new and unique detection method is proposed, which detects objects related to illegal buildings in time to discover illegal construction activities. Meanwhile, a new dataset and a high-precision and lightweight detector are proposed. The proposed detector is based on the algorithm You Only Look Once (YOLOv4). The use of DenseNet as the backbone of YDHNet enables better feature transfer and reuse, improves detection accuracy, and reduces computational costs. Meanwhile, depthwise separable convolution is employed to lightweight the neck and head to further reduce computational costs. Furthermore, H-swish is utilized to enhance non-linear feature extraction and improve detection accuracy. Experimental results illustrate that YDHNet realizes a mean average precision of 89.60% on the proposed dataset, which is 3.78% higher than YOLOv4. The computational cost and parameter count of YDHNet are 26.22 GFLOPs and 16.18 MB, respectively. Compared to YOLOv4 and other detectors, YDHNet not only has lower computational costs and higher detection accuracy, but also timely identifies illegal construction objects and automatically detects illegal construction activities.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1002-1022"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431753","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010081
Wujie Hu;Jinzhao Wu
Information networks are becoming increasingly important in practice. However, their escalating complexity is gradually impeding the efficiency of data mining. A novel network schema called the Behavior Schema of Information Networks (BSIN) is proposed to address this issue. This work defines the behavior of nodes as connected paths in BSIN, proposes a novel function distinguish behavior differences, and introduces approximate bisimulation into the acquisition of quotient sets for node types. The major highlight of BSIN is its ability to directly obtain a high-efficiency network on the basis of approximate bisimulation, rather than reducing the existing information network. It provides an effective representation of information networks, and the resulting novel network has a simple structure that more efficiently expresses semantic information than current network representations. The theoretical analysis of the connected paths between the original and the obtained networks demonstrates that errors are controllable; and semantic information is approximately retained. Case studies show that BSIN yields a simple network and is highly cost-effective.
{"title":"BSIN: A Behavior Schema of Information Networks Based on Approximate Bisimulation","authors":"Wujie Hu;Jinzhao Wu","doi":"10.26599/TST.2023.9010081","DOIUrl":"https://doi.org/10.26599/TST.2023.9010081","url":null,"abstract":"Information networks are becoming increasingly important in practice. However, their escalating complexity is gradually impeding the efficiency of data mining. A novel network schema called the Behavior Schema of Information Networks (BSIN) is proposed to address this issue. This work defines the behavior of nodes as connected paths in BSIN, proposes a novel function distinguish behavior differences, and introduces approximate bisimulation into the acquisition of quotient sets for node types. The major highlight of BSIN is its ability to directly obtain a high-efficiency network on the basis of approximate bisimulation, rather than reducing the existing information network. It provides an effective representation of information networks, and the resulting novel network has a simple structure that more efficiently expresses semantic information than current network representations. The theoretical analysis of the connected paths between the original and the obtained networks demonstrates that errors are controllable; and semantic information is approximately retained. Case studies show that BSIN yields a simple network and is highly cost-effective.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1092-1104"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431733","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010056
N. Meenakshi;Sultan Ahmad;A. V. Prabu;J. Nageswara Rao;Nashwan Adnan Othman;Hikmat A.M. Abdeljaber;R. Sekar;Jabeen Nazeer
The Wireless Sensor Network (WSN) is a network that is constructed in regions that are inaccessible to human beings. The widespread deployment of wireless micro sensors will make it possible to conduct accurate environmental monitoring for a use in both civil and military environments. They make use of these data to monitor and keep track of the physical data of the surrounding environment in order to ensure the sustainability of the area. The data have to be picked up by the sensor, and then sent to the sink node where they may be processed. The nodes of the WSNs are powered by batteries, therefore they eventually run out of power. This energy restriction has an effect on the network life span and environmental sustainability. The objective of this study is to further improve the Engroove Leach (EL) protocol's energy efficiency so that the network can operate for a very long time while consuming the least amount of energy. The lifespan of WSNs is being extended often using clustering and routing strategies. The Meta Inspired Hawks Fragment Optimization (MIHFO) system, which is based on passive clustering, is used in this study to do clustering. The cluster head is chosen based on the nodes' residual energy, distance to neighbors, distance to base station, node degree, and node centrality. Based on distance, residual energy, and node degree, an algorithm known as Heuristic Wing Antfly Optimization (HWAFO) selects the optimum path between the cluster head and Base Station (BS). They examine the number of nodes that are active, their energy consumption, and the number of data packets that the BS receives. The overall experimentation is carried out under the MATLAB environment. From the analysis, it has been discovered that the suggested approach yields noticeably superior outcomes in terms of throughput, packet delivery and drop ratio, and average energy consumption.
{"title":"Efficient Communication in Wireless Sensor Networks Using Optimized Energy Efficient Engroove Leach Clustering Protocol","authors":"N. Meenakshi;Sultan Ahmad;A. V. Prabu;J. Nageswara Rao;Nashwan Adnan Othman;Hikmat A.M. Abdeljaber;R. Sekar;Jabeen Nazeer","doi":"10.26599/TST.2023.9010056","DOIUrl":"https://doi.org/10.26599/TST.2023.9010056","url":null,"abstract":"The Wireless Sensor Network (WSN) is a network that is constructed in regions that are inaccessible to human beings. The widespread deployment of wireless micro sensors will make it possible to conduct accurate environmental monitoring for a use in both civil and military environments. They make use of these data to monitor and keep track of the physical data of the surrounding environment in order to ensure the sustainability of the area. The data have to be picked up by the sensor, and then sent to the sink node where they may be processed. The nodes of the WSNs are powered by batteries, therefore they eventually run out of power. This energy restriction has an effect on the network life span and environmental sustainability. The objective of this study is to further improve the Engroove Leach (EL) protocol's energy efficiency so that the network can operate for a very long time while consuming the least amount of energy. The lifespan of WSNs is being extended often using clustering and routing strategies. The Meta Inspired Hawks Fragment Optimization (MIHFO) system, which is based on passive clustering, is used in this study to do clustering. The cluster head is chosen based on the nodes' residual energy, distance to neighbors, distance to base station, node degree, and node centrality. Based on distance, residual energy, and node degree, an algorithm known as Heuristic Wing Antfly Optimization (HWAFO) selects the optimum path between the cluster head and Base Station (BS). They examine the number of nodes that are active, their energy consumption, and the number of data packets that the BS receives. The overall experimentation is carried out under the MATLAB environment. From the analysis, it has been discovered that the suggested approach yields noticeably superior outcomes in terms of throughput, packet delivery and drop ratio, and average energy consumption.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"985-1001"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431750","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010068
Suwei Wang;Xiang Ma;Xuemei Li
Edge is the key information in the process of image smoothing. Some edges, especially the weak edges, are difficult to maintain, which result in the local area being over-smoothed. For the protection of weak edges, we propose an image smoothing algorithm based on global sparse structure and parameter adaptation. The algorithm decomposes the image into high frequency and low frequency part based on global sparse structure. The low frequency part contains less texture information which is relatively easy to smoothen. The high frequency part is more sensitive to edge information so it is more suitable for the selection of smoothing parameters. To reduce the computational complexity and improve the effect, we propose a bicubic polynomial fitting method to fit all the sample values into a surface. Finally, we use Alternating Direction Method of Multipliers (ADMM) to unify the whole algorithm and obtain the smoothed results by iterative optimization. Compared with traditional methods and deep learning methods, as well as the application tasks of edge extraction, image abstraction, pseudo-boundary removal, and image enhancement, it shows that our algorithm can preserve the local weak edge of the image more effectively, and the visual effect of smoothed results is better.
{"title":"A Parameter Adaptive Method for Image Smoothing","authors":"Suwei Wang;Xiang Ma;Xuemei Li","doi":"10.26599/TST.2023.9010068","DOIUrl":"https://doi.org/10.26599/TST.2023.9010068","url":null,"abstract":"Edge is the key information in the process of image smoothing. Some edges, especially the weak edges, are difficult to maintain, which result in the local area being over-smoothed. For the protection of weak edges, we propose an image smoothing algorithm based on global sparse structure and parameter adaptation. The algorithm decomposes the image into high frequency and low frequency part based on global sparse structure. The low frequency part contains less texture information which is relatively easy to smoothen. The high frequency part is more sensitive to edge information so it is more suitable for the selection of smoothing parameters. To reduce the computational complexity and improve the effect, we propose a bicubic polynomial fitting method to fit all the sample values into a surface. Finally, we use Alternating Direction Method of Multipliers (ADMM) to unify the whole algorithm and obtain the smoothed results by iterative optimization. Compared with traditional methods and deep learning methods, as well as the application tasks of edge extraction, image abstraction, pseudo-boundary removal, and image enhancement, it shows that our algorithm can preserve the local weak edge of the image more effectively, and the visual effect of smoothed results is better.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1138-1151"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431736","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010105
Zhuolun Jiang;Zefei Ning;Hao Miao;Li Wang
Long-term multivariate time series forecasting is an important task in engineering applications. It helps grasp the future development trend of data in real-time, which is of great significance for a wide variety of fields. Due to the non-linear and unstable characteristics of multivariate time series, the existing methods encounter difficulties in analyzing complex high-dimensional data and capturing latent relationships between multivariates in time series, thus affecting the performance of long-term prediction. In this paper, we propose a novel time series forecasting model based on multilayer perceptron that combines spatio-temporal decomposition and doubly residual stacking, namely Spatio-Temporal Decomposition Neural Network (STDNet). We decompose the originally complex and unstable time series into two parts, temporal term and spatial term. We design temporal module based on auto-correlation mechanism to discover temporal dependencies at the sub-series level, and spatial module based on convolutional neural network and self-attention mechanism to integrate multivariate information from two dimensions, global and local, respectively. Then we integrate the results obtained from the different modules to get the final forecast. Extensive experiments on four real-world datasets show that STDNet significantly outperforms other state-of-the-art methods, which provides an effective solution for long-term time series forecasting.
{"title":"STDNet: A Spatio-Temporal Decomposition Neural Network for Multivariate Time Series Forecasting","authors":"Zhuolun Jiang;Zefei Ning;Hao Miao;Li Wang","doi":"10.26599/TST.2023.9010105","DOIUrl":"https://doi.org/10.26599/TST.2023.9010105","url":null,"abstract":"Long-term multivariate time series forecasting is an important task in engineering applications. It helps grasp the future development trend of data in real-time, which is of great significance for a wide variety of fields. Due to the non-linear and unstable characteristics of multivariate time series, the existing methods encounter difficulties in analyzing complex high-dimensional data and capturing latent relationships between multivariates in time series, thus affecting the performance of long-term prediction. In this paper, we propose a novel time series forecasting model based on multilayer perceptron that combines spatio-temporal decomposition and doubly residual stacking, namely Spatio-Temporal Decomposition Neural Network (STDNet). We decompose the originally complex and unstable time series into two parts, temporal term and spatial term. We design temporal module based on auto-correlation mechanism to discover temporal dependencies at the sub-series level, and spatial module based on convolutional neural network and self-attention mechanism to integrate multivariate information from two dimensions, global and local, respectively. Then we integrate the results obtained from the different modules to get the final forecast. Extensive experiments on four real-world datasets show that STDNet significantly outperforms other state-of-the-art methods, which provides an effective solution for long-term time series forecasting.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1232-1247"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431747","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010085
Shasha Li;Tiejun Cui;Wattana Viriyasitavat
In a smart system, the faults of edge devices directly impact the system's overall fault. Further, complexity arises when different edge devices provide varying fault data. To study the Smart System Fault Evolution Process (SSFEP) under different fault data conditions, an intelligent method for determining the Smart System Fault Probability (SSFP) is proposed. The data types provided by edge devices include the following: (1) only known edge device fault probability; (2) known Edge Device Fault Probability Distribution (EDFPD); (3) known edge device fault number and EDFPD; (4) known factor state of the edge device fault and EDFPD. Moreover, decision methods are proposed for each data case. Transfer Probability (TP) is divided into Continuity Transfer Probability (CTP) and Filterability Transfer Probability (FTP). CTP asserts that a Cause Event (CE) must lead to a Result Event (RE), while FTP requires CF probability to exceed a threshold before RF occurs. These probabilities are used to calculate SSFP. This paper introduces a decision method using the information diffusion principle for low-data SSFP determination, along with an improved method. The method is based on space fault network theory, abstracting SSFEP into a System Fault Evolution Process (SFEP) for research purposes.
{"title":"Edge Device Fault Probability Based Intelligent Calculations for Fault Probability of Smart Systems","authors":"Shasha Li;Tiejun Cui;Wattana Viriyasitavat","doi":"10.26599/TST.2023.9010085","DOIUrl":"https://doi.org/10.26599/TST.2023.9010085","url":null,"abstract":"In a smart system, the faults of edge devices directly impact the system's overall fault. Further, complexity arises when different edge devices provide varying fault data. To study the Smart System Fault Evolution Process (SSFEP) under different fault data conditions, an intelligent method for determining the Smart System Fault Probability (SSFP) is proposed. The data types provided by edge devices include the following: (1) only known edge device fault probability; (2) known Edge Device Fault Probability Distribution (EDFPD); (3) known edge device fault number and EDFPD; (4) known factor state of the edge device fault and EDFPD. Moreover, decision methods are proposed for each data case. Transfer Probability (TP) is divided into Continuity Transfer Probability (CTP) and Filterability Transfer Probability (FTP). CTP asserts that a Cause Event (CE) must lead to a Result Event (RE), while FTP requires CF probability to exceed a threshold before RF occurs. These probabilities are used to calculate SSFP. This paper introduces a decision method using the information diffusion principle for low-data SSFP determination, along with an improved method. The method is based on space fault network theory, abstracting SSFEP into a System Fault Evolution Process (SFEP) for research purposes.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1023-1036"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010079
Xin Li;Zhikuan Wang;Chenglizhao Chen;Chunfeng Tao;Yuanbo Qiu;Junde Liu;Baile Sun
Most existing image inpainting methods aim to fill in the missing content in the inside-hole region of the target image. However, the areas to be restored in realistically degraded images are unspecified. Previous studies have failed to recover the degradations due to the absence of the explicit mask indication. Meanwhile, inconsistent patterns are blended complexly with the image content. Therefore, estimating whether certain pixels are out of distribution and considering whether the object is consistent with the context is necessary. Motivated by these observations, a two-stage blind image inpainting network, which utilizes global semantic features of the image to locate semantically inconsistent regions and then generates reasonable content in the areas, is proposed. Specifically, the representation differences between inconsistent and available content are first amplified, iteratively predicting the region to be restored from coarse to fine. A confidence-driven inpainting network based on prediction masks is then used to estimate the information regarding missing regions. Furthermore, a multiscale contextual aggregation module is introduced for spatial feature transfer to refine the generated contents. Extensive experiments over multiple datasets demonstrate that the proposed method can generate visually plausible and structurally complete results that are particularly effective in recovering diverse degraded images.
{"title":"SemID: Blind Image Inpainting with Semantic Inconsistency Detection","authors":"Xin Li;Zhikuan Wang;Chenglizhao Chen;Chunfeng Tao;Yuanbo Qiu;Junde Liu;Baile Sun","doi":"10.26599/TST.2023.9010079","DOIUrl":"https://doi.org/10.26599/TST.2023.9010079","url":null,"abstract":"Most existing image inpainting methods aim to fill in the missing content in the inside-hole region of the target image. However, the areas to be restored in realistically degraded images are unspecified. Previous studies have failed to recover the degradations due to the absence of the explicit mask indication. Meanwhile, inconsistent patterns are blended complexly with the image content. Therefore, estimating whether certain pixels are out of distribution and considering whether the object is consistent with the context is necessary. Motivated by these observations, a two-stage blind image inpainting network, which utilizes global semantic features of the image to locate semantically inconsistent regions and then generates reasonable content in the areas, is proposed. Specifically, the representation differences between inconsistent and available content are first amplified, iteratively predicting the region to be restored from coarse to fine. A confidence-driven inpainting network based on prediction masks is then used to estimate the information regarding missing regions. Furthermore, a multiscale contextual aggregation module is introduced for spatial feature transfer to refine the generated contents. Extensive experiments over multiple datasets demonstrate that the proposed method can generate visually plausible and structurally complete results that are particularly effective in recovering diverse degraded images.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1053-1068"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431730","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010070
Tianyu Liu;Fuchun Sun
The oropharyngeal swabbing is a pre-diagnostic procedure used to test various respiratory diseases, including COVID and Influenza A (H1N1). To improve the testing efficiency of testing, a real-time, accurate, and robust sampling point localization algorithm is needed for robots. However, current solutions rely heavily on visual input, which is not reliable enough for large-scale deployment. The transformer has significantly improved the performance of image-related tasks and challenged the dominance of traditional convolutional neural networks (CNNs) in the image field. Inspired by its success, we propose a novel self-aligning multi-modal transformer (SAMMT) to dynamically attend to different parts of unaligned feature maps, preventing information loss caused by perspective disparity and simplifying overall implementation. Unlike preexisting multi-modal transformers, our attention mechanism works in image space instead of embedding space, rendering the need for the sensor registration process obsolete. To facilitate the multi-modal task, we collected and annotate an oropharynx localization/segmentation dataset by trained medical personnel. This dataset is open-sourced and can be used for future multi-modal research. Our experiments show that our model improves the performance of the localization task by 4.2% compared to the pure visual model, and reduces the pixel-wise error rate of the segmentation task by 16.7% compared to the CNN baseline.
{"title":"Self-Aligning Multi-Modal Transformer for Oropharyngeal Swab Point Localization","authors":"Tianyu Liu;Fuchun Sun","doi":"10.26599/TST.2023.9010070","DOIUrl":"https://doi.org/10.26599/TST.2023.9010070","url":null,"abstract":"The oropharyngeal swabbing is a pre-diagnostic procedure used to test various respiratory diseases, including COVID and Influenza A (H1N1). To improve the testing efficiency of testing, a real-time, accurate, and robust sampling point localization algorithm is needed for robots. However, current solutions rely heavily on visual input, which is not reliable enough for large-scale deployment. The transformer has significantly improved the performance of image-related tasks and challenged the dominance of traditional convolutional neural networks (CNNs) in the image field. Inspired by its success, we propose a novel self-aligning multi-modal transformer (SAMMT) to dynamically attend to different parts of unaligned feature maps, preventing information loss caused by perspective disparity and simplifying overall implementation. Unlike preexisting multi-modal transformers, our attention mechanism works in image space instead of embedding space, rendering the need for the sensor registration process obsolete. To facilitate the multi-modal task, we collected and annotate an oropharynx localization/segmentation dataset by trained medical personnel. This dataset is open-sourced and can be used for future multi-modal research. Our experiments show that our model improves the performance of the localization task by 4.2% compared to the pure visual model, and reduces the pixel-wise error rate of the segmentation task by 16.7% compared to the CNN baseline.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1082-1091"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431728","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.26599/TST.2023.9010088
Azam Fazel-Najafabadi;Mahdi Abbasi;Hani H. Attar;Ayman Amer;Amir Taherkordi;Azad Shokrollahi;Mohammad R. Khosravi;Ahmed A. Solyman
The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.
{"title":"High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments","authors":"Azam Fazel-Najafabadi;Mahdi Abbasi;Hani H. Attar;Ayman Amer;Amir Taherkordi;Azad Shokrollahi;Mohammad R. Khosravi;Ahmed A. Solyman","doi":"10.26599/TST.2023.9010088","DOIUrl":"https://doi.org/10.26599/TST.2023.9010088","url":null,"abstract":"The network switches in the data plane of Software Defined Networking (SDN) are empowered by an elementary process, in which enormous number of packets which resemble big volumes of data are classified into specific flows by matching them against a set of dynamic rules. This basic process accelerates the processing of data, so that instead of processing singular packets repeatedly, corresponding actions are performed on corresponding flows of packets. In this paper, first, we address limitations on a typical packet classification algorithm like Tuple Space Search (TSS). Then, we present a set of different scenarios to parallelize it on different parallel processing platforms, including Graphics Processing Units (GPUs), clusters of Central Processing Units (CPUs), and hybrid clusters. Experimental results show that the hybrid cluster provides the best platform for parallelizing packet classification algorithms, which promises the average throughput rate of 4.2 Million packets per second (Mpps). That is, the hybrid cluster produced by the integration of Compute Unified Device Architecture (CUDA), Message Passing Interface (MPI), and OpenMP programming model could classify 0.24 million packets per second more than the GPU cluster scheme. Such a packet classifier satisfies the required processing speed in the programmable network systems that would be used to communicate big medical data.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1118-1137"},"PeriodicalIF":6.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431734","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-20DOI: 10.26599/TST.2023.9010082
Bingyi Xie;Honghui Xu;YongJoon Joe;Daehee Seo;Zhipeng Cai
Deep learning based techniques are broadly used in various applications, which exhibit superior performance compared to traditional methods. One of the mainstream topics in computer vision is the image super-resolution task. In recent deep learning neural networks, the number of parameters in each convolution layer has increased along with more layers and feature maps, resulting in better image super-resolution performance. In today's era, numerous service providers offer super-resolution services to users, providing them with remarkable convenience. However, the availability of open-source super-resolution services exposes service providers to the risk of copyright infringement, as the complete model could be vulnerable to leakage. Therefore, safeguarding the copyright of the complete model is a non-trivial concern. To tackle this issue, this paper presents a lightweight model as a substitute for the original complete model in image super-resolution. This research has identified smaller networks that can deliver impressive performance, while protecting the original model's copyright. Finally, comprehensive experiments are conducted on multiple datasets to demonstrate the superiority of the proposed approach in generating super-resolution images even using lightweight neural networks.
{"title":"Lightweight Super-Resolution Model for Complete Model Copyright Protection","authors":"Bingyi Xie;Honghui Xu;YongJoon Joe;Daehee Seo;Zhipeng Cai","doi":"10.26599/TST.2023.9010082","DOIUrl":"https://doi.org/10.26599/TST.2023.9010082","url":null,"abstract":"Deep learning based techniques are broadly used in various applications, which exhibit superior performance compared to traditional methods. One of the mainstream topics in computer vision is the image super-resolution task. In recent deep learning neural networks, the number of parameters in each convolution layer has increased along with more layers and feature maps, resulting in better image super-resolution performance. In today's era, numerous service providers offer super-resolution services to users, providing them with remarkable convenience. However, the availability of open-source super-resolution services exposes service providers to the risk of copyright infringement, as the complete model could be vulnerable to leakage. Therefore, safeguarding the copyright of the complete model is a non-trivial concern. To tackle this issue, this paper presents a lightweight model as a substitute for the original complete model in image super-resolution. This research has identified smaller networks that can deliver impressive performance, while protecting the original model's copyright. Finally, comprehensive experiments are conducted on multiple datasets to demonstrate the superiority of the proposed approach in generating super-resolution images even using lightweight neural networks.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"29 4","pages":"1194-1205"},"PeriodicalIF":6.6,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10367775","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139715194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}