{"title":"Dynamic dead-instruction detection and elimination","authors":"J. A. Butts, G. Sohi","doi":"10.1145/605397.605419","DOIUrl":null,"url":null,"abstract":"We observe a non-negligible fraction--3 to 16% in our benchmarks--of dynamically dead instructions, dynamic instruction instances that generate unused results. The majority of these instructions arise from static instructions that also produce useful results. We find that compiler optimization (specifically instruction scheduling) creates a significant portion of these partially dead static instructions. We show that most of the dynamically instructions arise from a small set of static instructions that produce dead values most of the time.We leverage this locality by proposing a dead instruction predictor and presenting a scheme to avoid the execution of predicted-dead instructions. Our predictor achieves an accuracy of 93% while identifying over 91% of the dead instructions using less than 5 KB of state. We achieve such high accuracies by leveraging future control flow information (i.e., branch predictions) to distinguish between useless and useful instances of the same static instruction.We then present a mechanism to avoid the register allocation, instruction scheduling, and execution of predicted dead instructions. We measure reductions in resource utilization averaging over 5% and sometimes exceeding 10%, covering physical register management (allocation and freeing), register file read and write traffic, and data cache accesses. Performance improves by an average of 3.6% on an architecture exhibiting resource contention. Additionally, our scheme frees future compilers from the need to consider the costs of dead instructions, enabling more aggressive code motion and optimization. Simultaneously, it mitigates the need for good path profiling information in making inter-block code motion decisions.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS X","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/605397.605419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 69
Abstract
We observe a non-negligible fraction--3 to 16% in our benchmarks--of dynamically dead instructions, dynamic instruction instances that generate unused results. The majority of these instructions arise from static instructions that also produce useful results. We find that compiler optimization (specifically instruction scheduling) creates a significant portion of these partially dead static instructions. We show that most of the dynamically instructions arise from a small set of static instructions that produce dead values most of the time.We leverage this locality by proposing a dead instruction predictor and presenting a scheme to avoid the execution of predicted-dead instructions. Our predictor achieves an accuracy of 93% while identifying over 91% of the dead instructions using less than 5 KB of state. We achieve such high accuracies by leveraging future control flow information (i.e., branch predictions) to distinguish between useless and useful instances of the same static instruction.We then present a mechanism to avoid the register allocation, instruction scheduling, and execution of predicted dead instructions. We measure reductions in resource utilization averaging over 5% and sometimes exceeding 10%, covering physical register management (allocation and freeing), register file read and write traffic, and data cache accesses. Performance improves by an average of 3.6% on an architecture exhibiting resource contention. Additionally, our scheme frees future compilers from the need to consider the costs of dead instructions, enabling more aggressive code motion and optimization. Simultaneously, it mitigates the need for good path profiling information in making inter-block code motion decisions.