2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)最新文献_第9页

Load Balancing Optimization for Transformer in Distributed Environment 分布式环境下变压器负载均衡优化

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00109

Delu Ma, Zhou Lei, Shengbo Chen, Peng-Cheng Wang

In recent years, the demand for artificial intelligence applications has increased dramatically. Complex models can promote machine learning to achieve excellent results, but computing efficiency has gradually reached a bottleneck. Therefore, more researchers are exploring the improvement of the efficiency of intelligent computing systems. Distributed machine learning can improve the efficiency of model training and inference, but problems such as communication delay and load imbalance between computing nodes still exist. In the multi-GPU distributed computing environment, this paper takes the vision field algorithm VIT (vision transformer) as the optimization object, which has the advantage of convenient parallel training, and proposes several related solutions. Firstly, the parameter server is used as the system logic architecture and in order to reduce the idleness of the computing devices during the training process, the device working status query mechanism is designed to realize load balancing. Secondly, combined with the pre-trained small VIT algorithm model, semi-asynchronous communication method is proposed to reduce the communication overhead of computing devices and accelerate global convergence. The results of this experiment carried out in the existing distributed environment has demonstrated that compared with the existing synchronization method, the computational efficiency has been improved well under the premise of slightly reducing the accuracy.

近年来，对人工智能应用的需求急剧增加。复杂的模型可以促进机器学习取得优异的成绩，但计算效率逐渐达到瓶颈。因此，越来越多的研究者在探索如何提高智能计算系统的效率。分布式机器学习可以提高模型训练和推理的效率，但仍然存在计算节点之间的通信延迟和负载不平衡等问题。在多gpu分布式计算环境下，本文以视场算法VIT (vision transformer)为优化对象，该算法具有方便并行训练的优点，并提出了几种相关的解决方案。首先，采用参数服务器作为系统逻辑架构，为了减少训练过程中计算设备的空闲，设计了设备工作状态查询机制，实现负载均衡;其次，结合预训练的小VIT算法模型，提出半异步通信方法，降低计算设备的通信开销，加速全局收敛;在现有的分布式环境下进行的实验结果表明，与现有的同步方法相比，在精度略有降低的前提下，计算效率得到了较好的提高。

{"title":"Load Balancing Optimization for Transformer in Distributed Environment","authors":"Delu Ma, Zhou Lei, Shengbo Chen, Peng-Cheng Wang","doi":"10.1109/ICPADS53394.2021.00109","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00109","url":null,"abstract":"In recent years, the demand for artificial intelligence applications has increased dramatically. Complex models can promote machine learning to achieve excellent results, but computing efficiency has gradually reached a bottleneck. Therefore, more researchers are exploring the improvement of the efficiency of intelligent computing systems. Distributed machine learning can improve the efficiency of model training and inference, but problems such as communication delay and load imbalance between computing nodes still exist. In the multi-GPU distributed computing environment, this paper takes the vision field algorithm VIT (vision transformer) as the optimization object, which has the advantage of convenient parallel training, and proposes several related solutions. Firstly, the parameter server is used as the system logic architecture and in order to reduce the idleness of the computing devices during the training process, the device working status query mechanism is designed to realize load balancing. Secondly, combined with the pre-trained small VIT algorithm model, semi-asynchronous communication method is proposed to reduce the communication overhead of computing devices and accelerate global convergence. The results of this experiment carried out in the existing distributed environment has demonstrated that compared with the existing synchronization method, the computational efficiency has been improved well under the premise of slightly reducing the accuracy.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"45 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132893831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Delica: Decentralized Lightweight Collective Attestation for Disruptive IoT Networks Delica:颠覆性物联网网络的去中心化轻量级集体认证

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00051

Ziyu Wang, Cong Sun, Qingsong Yao, Duo Ding, Jianfeng Ma

The recent advance of the Internet of Things and autonomous systems brings massive security threats to the network of low-end embedded devices. Remote attestation is a hardware-assisted technique to verify the integrity and trustworthiness of software on remote devices. The recently proposed collective remote attestations have focused on attesting to the highly dynamic and disruptive device networks. However, they are generally inefficient due to the homogeneous node setting for the robustness of attestation reports aggregation. In this work, we propose Delica, an efficient and robust collective attestation framework for dynamic and disruptive networks. We differentiate the role of provers and aggregators to limit the redundant communications and attestation evidence aggregations for efficiency. Delica is capable of mitigating DoS attacks and detecting physical and black-hole attacks. The experimental results and analysis show that Delica can greatly reduce the per-node computational cost and reduce the network attestation cost by over 75% compared with the state-of-the-art approaches on disruptive networks.

近年来，物联网和自主系统的发展给低端嵌入式设备网络带来了巨大的安全威胁。远程认证是一种硬件辅助技术，用于验证远程设备上软件的完整性和可信度。最近提出的集体远程认证侧重于对高度动态和破坏性的设备网络进行认证。然而，由于为保证认证报告聚合的健壮性而采用同构节点设置，它们通常效率低下。在这项工作中，我们提出了Delica，这是一个针对动态和破坏性网络的高效且强大的集体认证框架。我们区分证明者和聚合者的角色，以限制冗余通信和证明证据聚合，提高效率。Delica能够减轻DoS攻击，并检测物理和黑洞攻击。实验结果和分析表明，与目前最先进的破坏性网络方法相比，Delica可以大大降低每个节点的计算成本，将网络认证成本降低75%以上。

{"title":"Delica: Decentralized Lightweight Collective Attestation for Disruptive IoT Networks","authors":"Ziyu Wang, Cong Sun, Qingsong Yao, Duo Ding, Jianfeng Ma","doi":"10.1109/ICPADS53394.2021.00051","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00051","url":null,"abstract":"The recent advance of the Internet of Things and autonomous systems brings massive security threats to the network of low-end embedded devices. Remote attestation is a hardware-assisted technique to verify the integrity and trustworthiness of software on remote devices. The recently proposed collective remote attestations have focused on attesting to the highly dynamic and disruptive device networks. However, they are generally inefficient due to the homogeneous node setting for the robustness of attestation reports aggregation. In this work, we propose Delica, an efficient and robust collective attestation framework for dynamic and disruptive networks. We differentiate the role of provers and aggregators to limit the redundant communications and attestation evidence aggregations for efficiency. Delica is capable of mitigating DoS attacks and detecting physical and black-hole attacks. The experimental results and analysis show that Delica can greatly reduce the per-node computational cost and reduce the network attestation cost by over 75% compared with the state-of-the-art approaches on disruptive networks.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"462 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129567338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RTPoW: A Proof-of-Work Consensus Scheme with Real-Time Difficulty Adjustment Algorithm RTPoW:一种具有实时难度调整算法的工作量证明共识方案

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00035

Weijia Feng, Zhenfu Cao, Jiachen Shen, Xiaolei Dong

Bitcoin, the first decentralized cryptocurrency system, uses a simple but effective difficulty adjustment algorithm to stabilize its average time of the block creation at 10 minutes. Over time, the volatility of the Bitcoin price has become higher and higher, and it causes the total hashrate (the hash power of the entire network) constantly fluctuating. Both facts and our experimental results prove that Bitcoin's difficulty adjustment algorithm cannot respond in time while the total hashrate is constantly fluctuating. Hence, we propose a consensus protocol with a real-time difficulty adjustment algorithm, RTPoW. RTPoW allows the blockchain to adjust the difficulty target of each block by predicting the real-time total hashrate, so the block time can remain stable even if the total hashrate is wildly fluctuating. To evaluate the effect of RTPoW, we implemented a simulator of an experimental environment and tested our algorithm. The results obtained have confirmed its effectiveness and stability.

比特币是第一个去中心化的加密货币系统，它使用了一种简单而有效的难度调整算法，将区块创建的平均时间稳定在10分钟。随着时间的推移，比特币价格的波动性越来越大，导致总哈希率(整个网络的哈希算力)不断波动。事实和我们的实验结果都证明，当总哈希值不断波动时，比特币的难度调整算法无法及时响应。因此，我们提出了一种带有实时难度调整算法RTPoW的共识协议。RTPoW允许区块链通过预测实时总哈希率来调整每个区块的难度目标，因此即使总哈希率剧烈波动，区块时间也可以保持稳定。为了评估RTPoW的效果，我们实现了一个实验环境的模拟器并测试了我们的算法。实验结果证实了该方法的有效性和稳定性。

引用次数: 0

Sensitivity loss training based implicit feedback 基于隐式反馈的灵敏度损失训练

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00036

Kunyu Li, Nan Wang, Xinyu Liu

In recommender systems, due to the lack of explicit feedback features, datasets with implicit feedback are always accustomed to train all samples without separating them during model training, without considering the non-consistency of samples. This leads to a significant decrease in sample utilization and creates challenges for model training. Also, little work has been done to explore the intrinsic laws implied in the implicit feedback dataset and how to effectively train the implicit feedback data. In this paper, we first summarize the variation pattern of loss with model training for different rating samples in the explicit feedback dataset, and find that model training is highly sensitive to the ratings. Second, we design an adaptive hierarchical training function with dynamic thresholds that can effectively distinguish different rating samples in the dataset, thus optimizing the implicit feedback dataset into an explicit feedback dataset to some extent. Finally, to better learn samples with different ratings, we also propose an adaptive hierarchical training strategy to obtain better training results in the implicit feedback dataset. Extensive experiments on three datasets show that our approach achieves excellent performance and greatly improves the performance of the model.

在推荐系统中，由于缺乏显式反馈特征，具有隐式反馈的数据集在模型训练时总是习惯于训练所有样本而不进行分离，而不考虑样本的不一致性。这导致样本利用率显著降低，并为模型训练带来挑战。此外，关于隐式反馈数据集隐含的内在规律以及如何有效训练隐式反馈数据的研究也很少。本文首先总结了显式反馈数据集中不同评级样本的损失随模型训练的变化规律，发现模型训练对评级高度敏感。其次，设计具有动态阈值的自适应分层训练函数，有效区分数据集中不同评级样本，从而在一定程度上将隐式反馈数据集优化为显式反馈数据集。最后，为了更好地学习不同评级的样本，我们还提出了一种自适应分层训练策略，以在隐式反馈数据集中获得更好的训练结果。在三个数据集上的大量实验表明，我们的方法取得了优异的性能，大大提高了模型的性能。

{"title":"Sensitivity loss training based implicit feedback","authors":"Kunyu Li, Nan Wang, Xinyu Liu","doi":"10.1109/ICPADS53394.2021.00036","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00036","url":null,"abstract":"In recommender systems, due to the lack of explicit feedback features, datasets with implicit feedback are always accustomed to train all samples without separating them during model training, without considering the non-consistency of samples. This leads to a significant decrease in sample utilization and creates challenges for model training. Also, little work has been done to explore the intrinsic laws implied in the implicit feedback dataset and how to effectively train the implicit feedback data. In this paper, we first summarize the variation pattern of loss with model training for different rating samples in the explicit feedback dataset, and find that model training is highly sensitive to the ratings. Second, we design an adaptive hierarchical training function with dynamic thresholds that can effectively distinguish different rating samples in the dataset, thus optimizing the implicit feedback dataset into an explicit feedback dataset to some extent. Finally, to better learn samples with different ratings, we also propose an adaptive hierarchical training strategy to obtain better training results in the implicit feedback dataset. Extensive experiments on three datasets show that our approach achieves excellent performance and greatly improves the performance of the model.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122014212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Boosting Byzantine Protocols in Large Sparse Networks with High System Assumption Coverage 高系统假设覆盖率的大型稀疏网络中拜占庭协议的增强

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00097

Shaolin Yu, Jihong Zhu, Jiali Yang, Yulong Zhan

To improve the overall efficiency and reliability of Byzantine protocols in large sparse networks, we propose a new system assumption for developing multi-scale fault-tolerant systems, with which several kinds of multi-scale Byzantine protocols are developed in large sparse networks with high system assumption coverage. By extending the traditional Byzantine adversary to the multi-scale adversaries, it is shown that efficient deterministic Byzantine broadcast and Byzantine agreement can be built in logarithmic-degree networks. Meanwhile, it is shown that the multi-scale adversary can make a finer trade-off between the system assumption coverage and the overall efficiency of the Byzantine protocols, especially when a small portion of the low-layer small-scale protocols are allowed to fail arbitrarily. With this, efficient Byzantine protocols can be built in large sparse networks with high system reliability.

为了提高拜占庭协议在大型稀疏网络中的整体效率和可靠性，提出了开发多尺度容错系统的新系统假设，并在此基础上开发了几种具有高系统假设覆盖率的大型稀疏网络中的多尺度拜占庭协议。通过将传统的拜占庭对手扩展到多尺度对手，证明了在对数度网络中可以建立有效的确定性拜占庭广播和拜占庭协议。同时，研究表明，多尺度攻击者可以在系统假设覆盖率和拜占庭协议的整体效率之间做出更好的权衡，特别是当允许一小部分低层小规模协议任意失效时。利用这种方法，可以在大型稀疏网络中构建高效的拜占庭协议，并具有较高的系统可靠性。

引用次数: 1

FerryLink: Combating Link Degradation for Practical LPWAN Deployments FerryLink:为实际的LPWAN部署对抗链路退化

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00077

Jing Yang, Zhenqiang Xu, Jiliang Wang

Low-Power Wide-Area Networks (LPWANs) have been shown as a promising technique to provide long-range low-power communication for large-scale IoT devices. In this paper, however, we show the poor performance of LoRa network due to its link diversity in macro- and micro- scope through one-month measurements in an area of $2.2 kmtimes 1.5 km$. We present FerryLink, which exploits such link diversity and leverages peer nodes to ferry data of weak links, to combat performance degradation. Traditional arts (e.g., building multi-hop networks) are inefficient or too heavyweight for the current star-topology-based LoRa network. FerryLink thus proposes a novel ferry mechanism combining RSSI sampling and Channel Activity detection(CAD) to suit multiple orthogonal transmission parameters of LoRa. To reduce energy overhead, FerryLink leverages convention windows for coarse-grained transmission synchronization between two coupled nodes. Finally, FerryLink utilizes the orthogonality of uplink and downlink signals to avoid data redundancy due to the ferry mechanism, maintaining comparable capacity with original LPWANs. We build FerryLink on top of LoRaWANwith commercial off-the-shelf hardware. The extensive evaluation results show that FerryLink effectively improves the packet delivery rate (PDR) of LoRa nodes (to over 95%), achieves 2x less energy overhead, and increases communication range by 50% compared with the original LoRaWAN.

低功耗广域网(lpwan)已被证明是为大规模物联网设备提供远程低功耗通信的一种有前途的技术。然而，在本文中，我们通过在2.2 km乘以1.5 km$的区域内进行为期一个月的测量，表明LoRa网络由于其宏观和微观的链路多样性而性能不佳。我们提出了FerryLink，它利用这种链路多样性并利用对等节点来传送弱链路的数据，以对抗性能下降。传统技术(例如，构建多跳网络)对于当前基于星型拓扑的LoRa网络来说效率低下或过于重量级。FerryLink因此提出了一种结合RSSI采样和信道活动检测(CAD)的新型轮渡机制，以适应LoRa的多个正交传输参数。为了减少能量开销，FerryLink利用约定窗口在两个耦合节点之间进行粗粒度传输同步。最后，FerryLink利用上行和下行信号的正交性来避免由于轮渡机制造成的数据冗余，保持与原始lpwan相当的容量。我们用商用现成的硬件在lorawan之上构建FerryLink。广泛的评估结果表明，FerryLink与原有的LoRaWAN相比，有效地提高了LoRa节点的分组分发率(PDR)(95%以上)，减少了2倍的能量开销，通信范围增加了50%。

{"title":"FerryLink: Combating Link Degradation for Practical LPWAN Deployments","authors":"Jing Yang, Zhenqiang Xu, Jiliang Wang","doi":"10.1109/ICPADS53394.2021.00077","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00077","url":null,"abstract":"Low-Power Wide-Area Networks (LPWANs) have been shown as a promising technique to provide long-range low-power communication for large-scale IoT devices. In this paper, however, we show the poor performance of LoRa network due to its link diversity in macro- and micro- scope through one-month measurements in an area of $2.2 kmtimes 1.5 km$. We present FerryLink, which exploits such link diversity and leverages peer nodes to ferry data of weak links, to combat performance degradation. Traditional arts (e.g., building multi-hop networks) are inefficient or too heavyweight for the current star-topology-based LoRa network. FerryLink thus proposes a novel ferry mechanism combining RSSI sampling and Channel Activity detection(CAD) to suit multiple orthogonal transmission parameters of LoRa. To reduce energy overhead, FerryLink leverages convention windows for coarse-grained transmission synchronization between two coupled nodes. Finally, FerryLink utilizes the orthogonality of uplink and downlink signals to avoid data redundancy due to the ferry mechanism, maintaining comparable capacity with original LPWANs. We build FerryLink on top of LoRaWANwith commercial off-the-shelf hardware. The extensive evaluation results show that FerryLink effectively improves the packet delivery rate (PDR) of LoRa nodes (to over 95%), achieves 2x less energy overhead, and increases communication range by 50% compared with the original LoRaWAN.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129795986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Measuring and Modeling Multipath of Wi-Fi to Locate People in Indoor Environments 室内环境中Wi-Fi多路径的测量与建模

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00029

Xiaoyu Ma, Hui He, Hui Zhang, Wei Xi, Zuhao Chen, Jizhong Zhao

With the rapid development of the Internet of Things (IoT) technology, the position information of indoor people has become an indispensable factor in most fields. Most existing indoor positioning schemes require people to keep moving to detect significant variance of the signal as the location feature. Hence, this paper proposes a passive indoor positioning system based on commodity Wi-Fi called Wisite, which can implement indoor multipath signal measurement and static person positioning modeling. The biggest challenge is how to detect the dynamic features in the reflection path of the static person to achieve target path matching. To address this issue, Wisite proposes a MUSIC expectation-maximization (MEM) joint parameter estimation algorithm to estimate and enhance the indoor multipath parameters. Then, a dynamic path matching model based on signal change enhancement (SCE) is proposed to enhance the signal changes caused by human activities, which can amplify the weak signal changes introduced by human respiration when a person is in a static state. Finally, the multipath geometric positioning model is used to calculate the person's position. We implement Wisite using commercial off-the-shelf (COTS) IEEE 802.11n devices and evaluate its performance via extensive experiments in typical real-world scenes. The results show that Wisite outperforms the comparison approaches in estimating accuracy and effectiveness with the average indoor positioning error is less than 0.65cm.

随着物联网(IoT)技术的飞速发展，室内人员的位置信息已成为大多数领域不可或缺的因素。现有的大多数室内定位方案都要求人们持续移动以检测信号的显著变化作为定位特征。因此，本文提出了一种基于商用Wi-Fi的被动室内定位系统Wisite，该系统可以实现室内多径信号测量和静态人定位建模。最大的挑战是如何检测静态人反射路径中的动态特征，从而实现目标路径匹配。为了解决这一问题，Wisite提出了MUSIC期望最大化(MEM)联合参数估计算法来估计和增强室内多径参数。然后，提出了一种基于信号变化增强(SCE)的动态路径匹配模型，对人体活动引起的信号变化进行增强，可以放大人体处于静止状态时由呼吸引起的微弱信号变化。最后，利用多路径几何定位模型计算人的位置。我们使用商用现货(COTS) IEEE 802.11n设备实现Wisite，并通过在典型现实场景中的大量实验评估其性能。结果表明，Wisite在估计精度和有效性方面均优于对比方法，平均室内定位误差小于0.65cm。

{"title":"Measuring and Modeling Multipath of Wi-Fi to Locate People in Indoor Environments","authors":"Xiaoyu Ma, Hui He, Hui Zhang, Wei Xi, Zuhao Chen, Jizhong Zhao","doi":"10.1109/ICPADS53394.2021.00029","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00029","url":null,"abstract":"With the rapid development of the Internet of Things (IoT) technology, the position information of indoor people has become an indispensable factor in most fields. Most existing indoor positioning schemes require people to keep moving to detect significant variance of the signal as the location feature. Hence, this paper proposes a passive indoor positioning system based on commodity Wi-Fi called Wisite, which can implement indoor multipath signal measurement and static person positioning modeling. The biggest challenge is how to detect the dynamic features in the reflection path of the static person to achieve target path matching. To address this issue, Wisite proposes a MUSIC expectation-maximization (MEM) joint parameter estimation algorithm to estimate and enhance the indoor multipath parameters. Then, a dynamic path matching model based on signal change enhancement (SCE) is proposed to enhance the signal changes caused by human activities, which can amplify the weak signal changes introduced by human respiration when a person is in a static state. Finally, the multipath geometric positioning model is used to calculate the person's position. We implement Wisite using commercial off-the-shelf (COTS) IEEE 802.11n devices and evaluate its performance via extensive experiments in typical real-world scenes. The results show that Wisite outperforms the comparison approaches in estimating accuracy and effectiveness with the average indoor positioning error is less than 0.65cm.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130787698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Push for HTTP Adaptive Streaming with Deep Reinforcement Learning 基于深度强化学习的HTTP自适应流动态推送

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00112

Haipeng Du, Danfu Yuan, Weizhan Zhang, Q. Zheng

HTTP adaptive streaming (HAS) has revolutionized video distribution over the Internet due to its prominent benefit of outstanding quality of experience (QoE). Due to the pull-based nature of HTTP/1.1, the client must make requests for each segment. This usually causes high request overhead and low bandwidth utilization and finally reduces QoE. Currently, research into the HAS adaptive bitrate algorithm typically focuses on the server-push feature introduced in the new HTTP standard, which enables the client to receive multiple segments with a single request. Every time a request is sent, the client must simultaneously make decisions on the number of segments the server should push and the bitrate of these future segments. As the decision space complexity increases, existing rule-based strategies inevitably fail to achieve optimal performance. In this paper, we present D-Push, an HAS framework that combines deep reinforcement learning (DRL) techniques. Instead of relying on inaccurate assumptions about the environment and network capacity variation models, D-Push trains a DRL model and makes decisions by exploiting the QoE of past decisions through the training process and adapts to a wide range of highly dynamic environments. The experimental results show that D-Push outperforms the existing state-of-the-art algorithm by 12%-24% in terms of the average QoE.

HTTP自适应流(HAS)由于其卓越的体验质量(QoE)的显著优势，已经彻底改变了互联网上的视频分发。由于HTTP/1.1基于拉的特性，客户端必须为每个段发出请求。这通常会导致高请求开销和低带宽利用率，最终降低QoE。目前，对HAS自适应比特率算法的研究主要集中在新HTTP标准中引入的服务器推送功能上，该功能使客户端能够通过单个请求接收多个段。每次发送请求时，客户端必须同时决定服务器应该推送的段的数量和这些未来段的比特率。随着决策空间复杂性的增加，现有的基于规则的策略不可避免地无法达到最优性能。在本文中，我们提出了D-Push，一个结合了深度强化学习(DRL)技术的HAS框架。D-Push不依赖于对环境和网络容量变化模型的不准确假设，而是训练一个DRL模型，并通过训练过程利用过去决策的QoE来做出决策，并适应大范围的高动态环境。实验结果表明，D-Push在平均QoE方面比现有最先进的算法高出12%-24%。

{"title":"Dynamic Push for HTTP Adaptive Streaming with Deep Reinforcement Learning","authors":"Haipeng Du, Danfu Yuan, Weizhan Zhang, Q. Zheng","doi":"10.1109/ICPADS53394.2021.00112","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00112","url":null,"abstract":"HTTP adaptive streaming (HAS) has revolutionized video distribution over the Internet due to its prominent benefit of outstanding quality of experience (QoE). Due to the pull-based nature of HTTP/1.1, the client must make requests for each segment. This usually causes high request overhead and low bandwidth utilization and finally reduces QoE. Currently, research into the HAS adaptive bitrate algorithm typically focuses on the server-push feature introduced in the new HTTP standard, which enables the client to receive multiple segments with a single request. Every time a request is sent, the client must simultaneously make decisions on the number of segments the server should push and the bitrate of these future segments. As the decision space complexity increases, existing rule-based strategies inevitably fail to achieve optimal performance. In this paper, we present D-Push, an HAS framework that combines deep reinforcement learning (DRL) techniques. Instead of relying on inaccurate assumptions about the environment and network capacity variation models, D-Push trains a DRL model and makes decisions by exploiting the QoE of past decisions through the training process and adapts to a wide range of highly dynamic environments. The experimental results show that D-Push outperforms the existing state-of-the-art algorithm by 12%-24% in terms of the average QoE.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130930378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Highly Scalable Parallel Checksums 高度可扩展的并行校验和

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00107

Christian Siebert

Checksums are used to detect errors that might occur while storing or communicating data. Checking the integrity of data is well-established, but only for smaller data sets. Contrary, supercomputers have to deal with huge amounts of data, which introduces failures that may remain undetected. Therefore, additional protection becomes a necessity at large scale. However, checking the integrity of larger data sets, especially in case of distributed data, clearly requires parallel approaches. We show how popular checksums, such as CRC-32 or Adler-32, can be parallelized efficiently. This also disproves a widespread belief that parallelizing aforementioned checksums, especially in a scalable way, is not possible. The mathematical properties behind these checksums enable a method to combine partial checksums such that its result corresponds to the checksum of the concatenated partial data. Our parallel checksum algorithm utilizes this combination idea in a scalable hierarchical reduction scheme to combine the partial checksums from an arbitrary number of processing elements. Although this reduction scheme can be implemented manually using most parallel programming interfaces, we use the Message Passing Interface, which supports such a functionality directly via non-commutative user-defined reduction operations. In conjunction with the efficient checksum capabilities of the zlib library, our algorithm can not only be implemented conveniently and in a portable way, but also very efficiently. Additional shared-memory parallelization within compute nodes completes our hybrid parallel checksum solutions, which show a high scalability of up to 524,288 threads. At this scale, computing the checksums of 240 TiB data took only 3.4 seconds for CRC-32 and 2.6 seconds for Adler-32. Finally, we discuss the APES application as a representative of dynamic supercomputer applications. Thanks to our scalable checksum algorithm, even such applications are now able to detect many errors within their distributed data sets.

校验和用于检测在存储或通信数据时可能发生的错误。检查数据的完整性已经建立，但只适用于较小的数据集。相反，超级计算机必须处理大量数据，这可能会导致未被发现的故障。因此，在大范围内，额外的保护是必要的。然而，检查大型数据集的完整性，特别是在分布式数据的情况下，显然需要并行方法。我们展示了流行的校验和(如CRC-32或Adler-32)如何有效地并行化。这也反驳了一个普遍的观点，即并行化前面提到的校验和，特别是以可伸缩的方式，是不可能的。这些校验和背后的数学属性使方法能够组合部分校验和，使其结果与连接的部分数据的校验和相对应。我们的并行校验和算法在可伸缩的分层约简方案中利用这种组合思想来组合来自任意数量的处理元素的部分校验和。虽然这种简化方案可以使用大多数并行编程接口手动实现，但我们使用消息传递接口，它通过非交换的用户定义简化操作直接支持这种功能。结合zlib库的有效校验和功能，我们的算法不仅可以方便地以可移植的方式实现，而且非常高效。计算节点内的额外共享内存并行化完成了我们的混合并行校验和解决方案，它显示了高达524,288个线程的高可伸缩性。在这个规模下，计算240 TiB数据的校验和对于CRC-32只需要3.4秒，对于Adler-32只需要2.6秒。最后，我们讨论了作为动态超级计算机应用代表的APES应用。由于我们的可扩展校验和算法，即使是这样的应用程序现在也能够检测到其分布式数据集中的许多错误。

{"title":"Highly Scalable Parallel Checksums","authors":"Christian Siebert","doi":"10.1109/ICPADS53394.2021.00107","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00107","url":null,"abstract":"Checksums are used to detect errors that might occur while storing or communicating data. Checking the integrity of data is well-established, but only for smaller data sets. Contrary, supercomputers have to deal with huge amounts of data, which introduces failures that may remain undetected. Therefore, additional protection becomes a necessity at large scale. However, checking the integrity of larger data sets, especially in case of distributed data, clearly requires parallel approaches. We show how popular checksums, such as CRC-32 or Adler-32, can be parallelized efficiently. This also disproves a widespread belief that parallelizing aforementioned checksums, especially in a scalable way, is not possible. The mathematical properties behind these checksums enable a method to combine partial checksums such that its result corresponds to the checksum of the concatenated partial data. Our parallel checksum algorithm utilizes this combination idea in a scalable hierarchical reduction scheme to combine the partial checksums from an arbitrary number of processing elements. Although this reduction scheme can be implemented manually using most parallel programming interfaces, we use the Message Passing Interface, which supports such a functionality directly via non-commutative user-defined reduction operations. In conjunction with the efficient checksum capabilities of the zlib library, our algorithm can not only be implemented conveniently and in a portable way, but also very efficiently. Additional shared-memory parallelization within compute nodes completes our hybrid parallel checksum solutions, which show a high scalability of up to 524,288 threads. At this scale, computing the checksums of 240 TiB data took only 3.4 seconds for CRC-32 and 2.6 seconds for Adler-32. Finally, we discuss the APES application as a representative of dynamic supercomputer applications. Thanks to our scalable checksum algorithm, even such applications are now able to detect many errors within their distributed data sets.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme 利用贪婪重写方案提高重删系统的恢复性能

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

Pub Date : 2021-12-01 DOI: 10.1109/ICPADS53394.2021.00042

Lifang Lin, Yuhui Deng, Yi Zhou

Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.

重复数据删除已被广泛用于提高存储空间利用率，然而，它受到数据碎片的困扰:逻辑上连续的块物理上分散在不同的容器中。许多重写方案(将碎片化的重复块重写到新的容器中)都试图减轻碎片造成的恢复性能下降。不幸的是，这些方案依赖于一个固定的阈值，并且不能选择适当的旧容器集进行重写，这导致在恢复备份时检索到的容器中存在大量冗余块。为了解决这个问题，我们提出了一种灵活的阈值重写方案，以提高恢复性能，同时保持较高的备份性能。我们定义了一个有效性度量——有效容器引用计数(VCRC)——它有助于识别要重写的适当容器。我们设计了一种基于贪婪算法的F-greedy算法，根据容器的VCRC分布动态调整阈值，旨在重写低VCRC的容器。我们从恢复性能、备份性能和存储开销方面定量地评估了三个真实备份数据集上的F-greedy。实证结果表明，与两种最先进的方案(Capping和SMR)相比，我们的方案在达到相似的备份性能的同时，将现有算法的恢复速度提高了1.3 - 2.4倍。

{"title":"Improving Restore Performance of Deduplication Systems via a Greedy Rewriting Scheme","authors":"Lifang Lin, Yuhui Deng, Yi Zhou","doi":"10.1109/ICPADS53394.2021.00042","DOIUrl":"https://doi.org/10.1109/ICPADS53394.2021.00042","url":null,"abstract":"Data deduplication has been widely used to improve storage space utilization, however, it is baffled by data fragmen-tation: logically consecutive chunks physically scattered across various containers. Many rewriting schemes, rewriting fragment-ed duplicate chunks into new containers, attempt to alleviate the restore performance degradation caused by fragmentation. Unfortunately, these schemes rely on a fixed threshold and fail to choose the appropriate set of old containers for rewriting, which leads to substantial redundant chunks existing in the retrieved containers when restoring backups. To address this issue, we propose a flexible threshold rewriting scheme to improve restore performance while maintaining high backup performance. We define an effectiveness metric - valid container reference counts (VCRC) - that facilitates identifying the appropriate containers for rewriting. We design a greedy-algorithm-based algorithm called F-greedy that dynamically adjusts the threshold according to the distribution of containers' VCRC, aiming to rewrite low-VCRC containers. We quantitatively evaluate F-greedy on three real-world backup datasets in terms of restore performance, backup performance, and storage overhead. The empirical results show that compared with two state-of-the-art schemes (Capping and SMR), our scheme improves the restore speed of the exiting algorithms by 1.3x - 2.4x while achieving similar backup performance.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122839659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1