Pub Date : 2023-12-23DOI: 10.1007/s11704-023-3142-5
Abstract
Federated Learning (FL) has emerged as a powerful technology designed for collaborative training between multiple clients and a server while maintaining data privacy of clients. To enhance the privacy in FL, Differentially Private Federated Learning (DPFL) has gradually become one of the most effective approaches. As DPFL operates in the distributed settings, there exist potential malicious adversaries who manipulate some clients and the aggregation server to produce malicious parameters and disturb the learning model. However, existing aggregation protocols for DPFL concern either the existence of some corrupted clients (Byzantines) or the corrupted server. Such protocols are limited to eliminate the effects of corrupted clients and server when both are in existence simultaneously due to the complicated threat model. In this paper, we elaborate such adversarial threat model and propose BVDFed. To our best knowledge, it is the first Byzantine-resilient and Verifiable aggregation for Differentially private FEDerated learning. In specific, we propose Differentially Private Federated Averaging algorithm (DPFA) as our primary workflow of BVDFed, which is more lightweight and easily portable than traditional DPFL algorithm. We then introduce Loss Score to indicate the trustworthiness of disguised gradients in DPFL. Based on Loss Score, we propose an aggregation rule DPLoss to eliminate faulty gradients from Byzantine clients during server aggregation while preserving the privacy of clients’ data. Additionally, we design a secure verification scheme DPVeri that are compatible with DPFA and DPLoss to support the honest clients in verifying the integrity of received aggregated results. And DPVeri also provides resistance to collusion attacks with no more than t participants for our aggregation. Theoretical analysis and experimental results demonstrate our aggregation to be feasible and effective in practice.
{"title":"BVDFed: Byzantine-resilient and verifiable aggregation for differentially private federated learning","authors":"","doi":"10.1007/s11704-023-3142-5","DOIUrl":"https://doi.org/10.1007/s11704-023-3142-5","url":null,"abstract":"<h3>Abstract</h3> <p>Federated Learning (FL) has emerged as a powerful technology designed for collaborative training between multiple clients and a server while maintaining data privacy of clients. To enhance the privacy in FL, Differentially Private Federated Learning (DPFL) has gradually become one of the most effective approaches. As DPFL operates in the distributed settings, there exist potential malicious adversaries who manipulate some clients and the aggregation server to produce malicious parameters and disturb the learning model. However, existing aggregation protocols for DPFL concern either the existence of some corrupted clients (Byzantines) or the corrupted server. Such protocols are limited to eliminate the effects of corrupted clients and server when both are in existence simultaneously due to the complicated threat model. In this paper, we elaborate such adversarial threat model and propose BVDFed. To our best knowledge, it is the first Byzantine-resilient and Verifiable aggregation for Differentially private FEDerated learning. In specific, we propose Differentially Private Federated Averaging algorithm (DPFA) as our primary workflow of BVDFed, which is more lightweight and easily portable than traditional DPFL algorithm. We then introduce Loss Score to indicate the trustworthiness of disguised gradients in DPFL. Based on Loss Score, we propose an aggregation rule DPLoss to eliminate faulty gradients from Byzantine clients during server aggregation while preserving the privacy of clients’ data. Additionally, we design a secure verification scheme DPVeri that are compatible with DPFA and DPLoss to support the honest clients in verifying the integrity of received aggregated results. And DPVeri also provides resistance to collusion attacks with no more than <em>t</em> participants for our aggregation. Theoretical analysis and experimental results demonstrate our aggregation to be feasible and effective in practice.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-23DOI: 10.1007/s11704-023-2751-3
Lijuan Ren, Liangxiao Jiang, Wenjun Zhang, Chaoqun Li
Abstract
In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels from different crowd workers and then infer its integrated label via label aggregation. In spite of the effectiveness of label aggregation methods, there still remains a certain level of noise in the integrated labels. Thus, some noise correction methods have been proposed to reduce the impact of noise in recent years. However, to the best of our knowledge, existing methods rarely consider an instance’s information from both its features and multiple noisy labels simultaneously when identifying a noise instance. In this study, we argue that the more distinguishable an instance’s features but the noisier its multiple noisy labels, the more likely it is a noise instance. Based on this premise, we propose a label distribution similarity-based noise correction (LDSNC) method. To measure whether an instance’s features are distinguishable, we obtain each instance’s predicted label distribution by building multiple classifiers using instances’ features and their integrated labels. To measure whether an instance’s multiple noisy labels are noisy, we obtain each instance’s multiple noisy label distribution using its multiple noisy labels. Then, we use the Kullback-Leibler (KL) divergence to calculate the similarity between the predicted label distribution and multiple noisy label distribution and define the instance with the lower similarity as a noise instance. The extensive experimental results on 34 simulated and four real-world crowdsourced datasets validate the effectiveness of our method.
{"title":"Label distribution similarity-based noise correction for crowdsourcing","authors":"Lijuan Ren, Liangxiao Jiang, Wenjun Zhang, Chaoqun Li","doi":"10.1007/s11704-023-2751-3","DOIUrl":"https://doi.org/10.1007/s11704-023-2751-3","url":null,"abstract":"<h3>Abstract</h3> <p>In crowdsourcing scenarios, we can obtain each instance’s multiple noisy labels from different crowd workers and then infer its integrated label via label aggregation. In spite of the effectiveness of label aggregation methods, there still remains a certain level of noise in the integrated labels. Thus, some noise correction methods have been proposed to reduce the impact of noise in recent years. However, to the best of our knowledge, existing methods rarely consider an instance’s information from both its features and multiple noisy labels simultaneously when identifying a noise instance. In this study, we argue that the more distinguishable an instance’s features but the noisier its multiple noisy labels, the more likely it is a noise instance. Based on this premise, we propose a label distribution similarity-based noise correction (LDSNC) method. To measure whether an instance’s features are distinguishable, we obtain each instance’s predicted label distribution by building multiple classifiers using instances’ features and their integrated labels. To measure whether an instance’s multiple noisy labels are noisy, we obtain each instance’s multiple noisy label distribution using its multiple noisy labels. Then, we use the Kullback-Leibler (KL) divergence to calculate the similarity between the predicted label distribution and multiple noisy label distribution and define the instance with the lower similarity as a noise instance. The extensive experimental results on 34 simulated and four real-world crowdsourced datasets validate the effectiveness of our method.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-23DOI: 10.1007/s11704-023-2782-9
Abstract
In this paper, we propose the concept of delegable zero knowledge succinct non-interactive arguments of knowledge (zk-SNARKs). The delegable zk-SNARK is parameterized by (μ,k,k′,k″). The delegable property of zk-SNARKs allows the prover to delegate its proving ability to μ proxies. Any k honest proxies are able to generate the correct proof for a statement, but the collusion of less than k proxies does not obtain information about the witness of the statement. We also define k′-soundness and k″-zero knowledge by taking into consider of multi-proxies.
We propose a construction of (μ,2t + 1,t,t)- delegable zk-SNARK for the NPC language of arithmetic circuit satisfiability. Our delegable zk-SNARK stems from Groth’s zk-SNARK scheme (Groth16). We take advantage of the additive and multiplicative properties of polynomial-based secret sharing schemes to achieve delegation for zk-SNARK. Our secret sharing scheme works well with the pairing groups so that the nice succinct properties of Groth’s zk-SNARK scheme are preserved, while augmenting the delegable property and keeping soundness and zero-knowledge in the scenario of multi-proxies.
{"title":"Delegable zk-SNARKs with proxies","authors":"","doi":"10.1007/s11704-023-2782-9","DOIUrl":"https://doi.org/10.1007/s11704-023-2782-9","url":null,"abstract":"<h3>Abstract</h3> <p>In this paper, we propose the concept of delegable zero knowledge succinct non-interactive arguments of knowledge (zk-SNARKs). The delegable zk-SNARK is parameterized by (<em>μ,k,k′,k″</em>). The delegable property of zk-SNARKs allows the prover to delegate its proving ability to <em>μ</em> proxies. Any <em>k</em> honest proxies are able to generate the correct proof for a statement, but the collusion of less than <em>k</em> proxies does not obtain information about the witness of the statement. We also define <em>k′</em>-soundness and <em>k″</em>-zero knowledge by taking into consider of multi-proxies.</p> <p>We propose a construction of (<em>μ</em>,2<em>t</em> + 1,<em>t,t</em>)- delegable zk-SNARK for the NPC language of arithmetic circuit satisfiability. Our delegable zk-SNARK stems from Groth’s zk-SNARK scheme (Groth16). We take advantage of the additive and multiplicative properties of polynomial-based secret sharing schemes to achieve delegation for zk-SNARK. Our secret sharing scheme works well with the pairing groups so that the nice succinct properties of Groth’s zk-SNARK scheme are preserved, while augmenting the delegable property and keeping soundness and zero-knowledge in the scenario of multi-proxies.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-23DOI: 10.1007/s11704-023-2418-0
Jiacheng Li, Ruize Han, Wei Feng, Haomin Yan, Song Wang
Human interaction recognition is an essential task in video surveillance. The current works on human interaction recognition mainly focus on the scenarios only containing the close-contact interactive subjects without other people. In this paper, we handle more practical but more challenging scenarios where interactive subjects are contactless and other subjects not involved in the interactions of interest are also present in the scene. To address this problem, we propose an Interactive Relation Embedding Network (IRE-Net) to simultaneously identify the subjects involved in the interaction and recognize their interaction category. As a new problem, we also build a new dataset with annotations and metrics for performance evaluation. Experimental results on this dataset show significant improvements of the proposed method when compared with current methods developed for human interaction recognition and group activity recognition.
{"title":"Contactless interaction recognition and interactor detection in multi-person scenes","authors":"Jiacheng Li, Ruize Han, Wei Feng, Haomin Yan, Song Wang","doi":"10.1007/s11704-023-2418-0","DOIUrl":"https://doi.org/10.1007/s11704-023-2418-0","url":null,"abstract":"<p>Human interaction recognition is an essential task in video surveillance. The current works on human interaction recognition mainly focus on the scenarios only containing the close-contact interactive subjects without other people. In this paper, we handle more practical but more challenging scenarios where interactive subjects are contactless and other subjects not involved in the interactions of interest are also present in the scene. To address this problem, we propose an Interactive Relation Embedding Network (IRE-Net) to simultaneously identify the subjects involved in the interaction and recognize their interaction category. As a new problem, we also build a new dataset with annotations and metrics for performance evaluation. Experimental results on this dataset show significant improvements of the proposed method when compared with current methods developed for human interaction recognition and group activity recognition.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139025445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A large body of research effort has been dedicated to automated issue classification for Issue Tracking Systems (ITSs). Although the existing approaches have shown promising performance, the different design choices, including the different textual fields, feature representation methods and machine learning algorithms adopted by existing approaches, have not been comprehensively compared and analyzed. To fill this gap, we perform the first extensive study of automated issue classification on 9 state-of-the-art issue classification approaches. Our experimental results on the widely studied dataset reveal multiple practical guidelines for automated issue classification, including: (1) Training separate models for the issue titles and descriptions and then combining these two models tend to achieve better performance for issue classification; (2) Word embedding with Long Short-Term Memory (LSTM) can better extract features from the textual fields in the issues, and hence, lead to better issue classification models; (3) There exist certain terms in the textual fields that are helpful for building more discriminating classifiers between bug and non-bug issues; (4) The performance of the issue classification model is not sensitive to the choices of ML algorithms. Based on our study outcomes, we further propose an advanced issue classification approach, DeepLabel, which can achieve better performance compared with the existing issue classification approaches.
{"title":"Empirically revisiting and enhancing automatic classification of bug and non-bug issues","authors":"Zhong Li, Minxue Pan, Yu Pei, Tian Zhang, Linzhang Wang, Xuandong Li","doi":"10.1007/s11704-023-2771-z","DOIUrl":"https://doi.org/10.1007/s11704-023-2771-z","url":null,"abstract":"<p>A large body of research effort has been dedicated to automated issue classification for Issue Tracking Systems (ITSs). Although the existing approaches have shown promising performance, the different design choices, including the different textual fields, feature representation methods and machine learning algorithms adopted by existing approaches, have not been comprehensively compared and analyzed. To fill this gap, we perform the first extensive study of automated issue classification on 9 state-of-the-art issue classification approaches. Our experimental results on the widely studied dataset reveal multiple practical guidelines for automated issue classification, including: (1) Training separate models for the issue titles and descriptions and then combining these two models tend to achieve better performance for issue classification; (2) Word embedding with Long Short-Term Memory (LSTM) can better extract features from the textual fields in the issues, and hence, lead to better issue classification models; (3) There exist certain terms in the textual fields that are helpful for building more discriminating classifiers between bug and non-bug issues; (4) The performance of the issue classification model is not sensitive to the choices of ML algorithms. Based on our study outcomes, we further propose an advanced issue classification approach, D<span>eep</span>L<span>abel</span>, which can achieve better performance compared with the existing issue classification approaches.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discourse relation classification is a fundamental task for discourse analysis, which is essential for understanding the structure and connection of texts. Implicit discourse relation classification aims to determine the relationship between adjacent sentences and is very challenging because it lacks explicit discourse connectives as linguistic cues and sufficient annotated training data. In this paper, we propose a discriminative instance selection method to construct synthetic implicit discourse relation data from easy-to-collect explicit discourse relations. An expanded instance consists of an argument pair and its sense label. We introduce the argument pair type classification task, which aims to distinguish between implicit and explicit argument pairs and select the explicit argument pairs that are most similar to natural implicit argument pairs for data expansion. We also propose a simple label-smoothing technique to assign robust sense labels for the selected argument pairs. We evaluate our method on PDTB 2.0 and PDTB 3.0. The results show that our method can consistently improve the performance of the baseline model, and achieve competitive results with the state-of-the-art models.
{"title":"Discriminative explicit instance selection for implicit discourse relation classification","authors":"Wei Song, Hongfei Han, Xu Han, Miaomiao Cheng, Jiefu Gong, Shijin Wang, Ting Liu","doi":"10.1007/s11704-023-3058-2","DOIUrl":"https://doi.org/10.1007/s11704-023-3058-2","url":null,"abstract":"<p>Discourse relation classification is a fundamental task for discourse analysis, which is essential for understanding the structure and connection of texts. Implicit discourse relation classification aims to determine the relationship between adjacent sentences and is very challenging because it lacks explicit discourse connectives as linguistic cues and sufficient annotated training data. In this paper, we propose a discriminative instance selection method to construct synthetic implicit discourse relation data from easy-to-collect explicit discourse relations. An expanded instance consists of an argument pair and its sense label. We introduce the argument pair type classification task, which aims to distinguish between implicit and explicit argument pairs and select the explicit argument pairs that are most similar to natural implicit argument pairs for data expansion. We also propose a simple label-smoothing technique to assign robust sense labels for the selected argument pairs. We evaluate our method on PDTB 2.0 and PDTB 3.0. The results show that our method can consistently improve the performance of the baseline model, and achieve competitive results with the state-of-the-art models.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139025427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-23DOI: 10.1007/s11704-023-2655-2
Abstract
Graph convolutional networks (GCNs) have become prevalent in recommender system (RS) due to their superiority in modeling collaborative patterns. Although improving the overall accuracy, GCNs unfortunately amplify popularity bias — tail items are less likely to be recommended. This effect prevents the GCN-based RS from making precise and fair recommendations, decreasing the effectiveness of recommender systems in the long run.
In this paper, we investigate how graph convolutions amplify the popularity bias in RS. Through theoretical analyses, we identify two fundamental factors: (1) with graph convolution (i.e., neighborhood aggregation), popular items exert larger influence than tail items on neighbor users, making the users move towards popular items in the representation space; (2) after multiple times of graph convolution, popular items would affect more high-order neighbors and become more influential. The two points make popular items get closer to almost users and thus being recommended more frequently. To rectify this, we propose to estimate the amplified effect of popular nodes on each node’s representation, and intervene the effect after each graph convolution. Specifically, we adopt clustering to discover highly-influential nodes and estimate the amplification effect of each node, then remove the effect from the node embeddings at each graph convolution layer. Our method is simple and generic — it can be used in the inference stage to correct existing models rather than training a new model from scratch, and can be applied to various GCN models. We demonstrate our method on two representative GCN backbones LightGCN and UltraGCN, verifying its ability in improving the recommendations of tail items without sacrificing the performance of popular items. Codes are open-sourced 1).
{"title":"How graph convolutions amplify popularity bias for recommendation?","authors":"","doi":"10.1007/s11704-023-2655-2","DOIUrl":"https://doi.org/10.1007/s11704-023-2655-2","url":null,"abstract":"<h3>Abstract</h3> <p>Graph convolutional networks (GCNs) have become prevalent in recommender system (RS) due to their superiority in modeling collaborative patterns. Although improving the overall accuracy, GCNs unfortunately amplify popularity bias — tail items are less likely to be recommended. This effect prevents the GCN-based RS from making precise and fair recommendations, decreasing the effectiveness of recommender systems in the long run.</p> <p>In this paper, we investigate how graph convolutions amplify the popularity bias in RS. Through theoretical analyses, we identify two fundamental factors: (1) with graph convolution (i.e., neighborhood aggregation), popular items exert larger influence than tail items on neighbor users, making the users move towards popular items in the representation space; (2) after multiple times of graph convolution, popular items would affect more high-order neighbors and become more influential. The two points make popular items get closer to almost users and thus being recommended more frequently. To rectify this, we propose to estimate the amplified effect of popular nodes on each node’s representation, and intervene the effect after each graph convolution. Specifically, we adopt clustering to discover highly-influential nodes and estimate the amplification effect of each node, then remove the effect from the node embeddings at each graph convolution layer. Our method is simple and generic — it can be used in the inference stage to correct existing models rather than training a new model from scratch, and can be applied to various GCN models. We demonstrate our method on two representative GCN backbones LightGCN and UltraGCN, verifying its ability in improving the recommendations of tail items without sacrificing the performance of popular items. Codes are open-sourced <sup>1)</sup>.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139026894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-23DOI: 10.1007/s11704-023-3026-8
Nan Sun, Wei Wang, Yongxin Tong, Kexin Liu
In Internet of Things (IoT), data sharing among different devices can improve manufacture efficiency and reduce workload, and yet make the network systems be more vulnerable to various intrusion attacks. There has been realistic demand to develop an efficient intrusion detection algorithm for connected devices. Most of existing intrusion detection methods are trained in a centralized manner and are incapable to identify new unlabeled attack types. In this paper, a distributed federated intrusion detection method is proposed, utilizing the information contained in the labeled data as the prior knowledge to discover new unlabeled attack types. Besides, the blockchain technique is introduced in the federated learning process for the consensus of the entire framework. Experimental results are provided to show that our approach can identify the malicious entities, while outperforming the existing methods in discovering new intrusion attack types.
{"title":"Blockchain based federated learning for intrusion detection for Internet of Things","authors":"Nan Sun, Wei Wang, Yongxin Tong, Kexin Liu","doi":"10.1007/s11704-023-3026-8","DOIUrl":"https://doi.org/10.1007/s11704-023-3026-8","url":null,"abstract":"<p>In Internet of Things (IoT), data sharing among different devices can improve manufacture efficiency and reduce workload, and yet make the network systems be more vulnerable to various intrusion attacks. There has been realistic demand to develop an efficient intrusion detection algorithm for connected devices. Most of existing intrusion detection methods are trained in a centralized manner and are incapable to identify new unlabeled attack types. In this paper, a distributed federated intrusion detection method is proposed, utilizing the information contained in the labeled data as the prior knowledge to discover new unlabeled attack types. Besides, the blockchain technique is introduced in the federated learning process for the consensus of the entire framework. Experimental results are provided to show that our approach can identify the malicious entities, while outperforming the existing methods in discovering new intrusion attack types.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139025405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-23DOI: 10.1007/s11704-023-3344-x
Peiquan Jin, Zhaole Chu, Gaocong Liu, Yongping Luo, Shouhong Wan
The advance in Non-Volatile Memory (NVM) has changed the traditional DRAM-only memory system. Compared to DRAM, NVM has the advantages of non-volatility and large capacity. However, as the read/write speed of NVM is still lower than that of DRAM, building DRAM/NVM-based hybrid memory systems is a feasible way of adding NVM into the current computer architecture. This paper aims to optimize the well-known B+-tree for hybrid memory. The novelty of this study is two-fold. First, we observed that the space utilization of internal nodes in B+-tree is generally below 70%. Inspired by this observation, we propose to maintain hot keys in the free space within internal nodes, yielding a new index named HATree (Hotness-Aware Tree). The new idea of HATree is to use the unused space of the parent of leaf nodes (PLNs) as the hotspot data cache. Thus, no extra space is needed, and the in-node hotspot cache can efficiently improve query performance. Second, to further improve the update performance of HATree, we propose to utilize the eADR technology supported by the third-generation Intel Xeon Scalable Processors to enhance HATree with instant log persistence, which results in the new HATree-Log structure. We conduct extensive experiments on real hybrid memory architecture involving DRAM and Intel Optane Persistent Memory to evaluate the performance of HATree and HATree-Log. Three state-of-the-art indices for hybrid memory, namely NBTree, LBTree, and FPTree, are included in the experiments, and the results suggest the efficiency of HATree and HATree-Log.
{"title":"Optimizing B+-tree for hybrid memory with in-node hotspot cache and eADR awareness","authors":"Peiquan Jin, Zhaole Chu, Gaocong Liu, Yongping Luo, Shouhong Wan","doi":"10.1007/s11704-023-3344-x","DOIUrl":"https://doi.org/10.1007/s11704-023-3344-x","url":null,"abstract":"<p>The advance in Non-Volatile Memory (NVM) has changed the traditional DRAM-only memory system. Compared to DRAM, NVM has the advantages of non-volatility and large capacity. However, as the read/write speed of NVM is still lower than that of DRAM, building DRAM/NVM-based hybrid memory systems is a feasible way of adding NVM into the current computer architecture. This paper aims to optimize the well-known B<sup>+</sup>-tree for hybrid memory. The novelty of this study is two-fold. First, we observed that the space utilization of internal nodes in B<sup>+</sup>-tree is generally below 70%. Inspired by this observation, we propose to maintain hot keys in the free space within internal nodes, yielding a new index named <i>HATree</i> (<i>Hotness-Aware Tree</i>). The new idea of HATree is to use the unused space of the parent of leaf nodes (PLNs) as the hotspot data cache. Thus, no extra space is needed, and the in-node hotspot cache can efficiently improve query performance. Second, to further improve the update performance of HATree, we propose to utilize the eADR technology supported by the third-generation Intel Xeon Scalable Processors to enhance HATree with instant log persistence, which results in the new HATree-Log structure. We conduct extensive experiments on real hybrid memory architecture involving DRAM and Intel Optane Persistent Memory to evaluate the performance of HATree and HATree-Log. Three state-of-the-art indices for hybrid memory, namely NBTree, LBTree, and FPTree, are included in the experiments, and the results suggest the efficiency of HATree and HATree-Log.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139025409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-18DOI: 10.1007/s11704-023-2714-8
Abstract
Accurate monitoring of urban waterlogging contributes to the city’s normal operation and the safety of residents’ daily travel. However, due to feedback delays or high costs, existing methods make large-scale, fine-grained waterlogging monitoring impossible. A common method is to forecast the city’s global waterlogging status using its partial waterlogging data. This method has two challenges: first, existing predictive algorithms are either driven by knowledge or data alone; and second, the partial waterlogging data is not collected selectively, resulting in poor predictions. To overcome the aforementioned challenges, this paper proposes a framework for large-scale and fine-grained spatiotemporal waterlogging monitoring based on the opportunistic sensing of limited bus routes. This framework follows the Sparse Crowdsensing and mainly comprises a pair of iterative predictor and selector. The predictor uses the collected waterlogging status and the predicted status of the uncollected area to train the graph convolutional neural network. It combines both knowledge-driven and data-driven approaches and can be used to forecast waterlogging status in all regions for the upcoming term. The selector consists of a two-stage selection procedure that can select valuable bus routes while satisfying budget constraints. The experimental results on real waterlogging and bus routes in Shenzhen show that the proposed framework could easily perform urban waterlogging monitoring with low cost, high accuracy, wide coverage, and fine granularity.
{"title":"Route selection for opportunity-sensing and prediction of waterlogging","authors":"","doi":"10.1007/s11704-023-2714-8","DOIUrl":"https://doi.org/10.1007/s11704-023-2714-8","url":null,"abstract":"<h3>Abstract</h3> <p>Accurate monitoring of urban waterlogging contributes to the city’s normal operation and the safety of residents’ daily travel. However, due to feedback delays or high costs, existing methods make large-scale, fine-grained waterlogging monitoring impossible. A common method is to forecast the city’s global waterlogging status using its partial waterlogging data. This method has two challenges: first, existing predictive algorithms are either driven by knowledge or data alone; and second, the partial waterlogging data is not collected selectively, resulting in poor predictions. To overcome the aforementioned challenges, this paper proposes a framework for large-scale and fine-grained spatiotemporal waterlogging monitoring based on the opportunistic sensing of limited bus routes. This framework follows the Sparse Crowdsensing and mainly comprises a pair of iterative predictor and selector. The predictor uses the collected waterlogging status and the predicted status of the uncollected area to train the graph convolutional neural network. It combines both knowledge-driven and data-driven approaches and can be used to forecast waterlogging status in all regions for the upcoming term. The selector consists of a two-stage selection procedure that can select valuable bus routes while satisfying budget constraints. The experimental results on real waterlogging and bus routes in Shenzhen show that the proposed framework could easily perform urban waterlogging monitoring with low cost, high accuracy, wide coverage, and fine granularity.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":null,"pages":null},"PeriodicalIF":4.2,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138743332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}