Adam N. Jacobvitz, Andrew D. Hilton, Daniel J. Sorin
{"title":"多程序基准定义","authors":"Adam N. Jacobvitz, Andrew D. Hilton, Daniel J. Sorin","doi":"10.1109/ISPASS.2015.7095786","DOIUrl":null,"url":null,"abstract":"Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different interactions depending on how they align with each other. While prior work has focused on sampling multiprogram benchmarks, little attention has been paid to defining the benchmarks in their entirety. In this work, we propose a four-tuple that formally defines multi-program benchmarks in a well-defined way. We then examine how four different classes of benchmarks created by varying the elements of this tuple align with real-world use-cases. We evaluate the impact of these variations on real hardware, and see drastic variations in results between different benchmarks constructed from the same programs. Notable differences include significant speedups versus slowdowns (e.g., +57% vs -5% or +26% vs -18%), and large differences in magnitude even when the results are in the same direction (e.g., 67% versus 11%).","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Multi-program benchmark definition\",\"authors\":\"Adam N. Jacobvitz, Andrew D. Hilton, Daniel J. Sorin\",\"doi\":\"10.1109/ISPASS.2015.7095786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different interactions depending on how they align with each other. While prior work has focused on sampling multiprogram benchmarks, little attention has been paid to defining the benchmarks in their entirety. In this work, we propose a four-tuple that formally defines multi-program benchmarks in a well-defined way. We then examine how four different classes of benchmarks created by varying the elements of this tuple align with real-world use-cases. We evaluate the impact of these variations on real hardware, and see drastic variations in results between different benchmarks constructed from the same programs. Notable differences include significant speedups versus slowdowns (e.g., +57% vs -5% or +26% vs -18%), and large differences in magnitude even when the results are in the same direction (e.g., 67% versus 11%).\",\"PeriodicalId\":189378,\"journal\":{\"name\":\"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"108 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS.2015.7095786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
摘要
尽管单程序基准的定义相对简单——基准是一个程序加上一个特定的输入——但多程序基准的定义要复杂得多。每个程序可能有不同的运行时,它们可能有不同的交互,这取决于它们如何相互对齐。虽然以前的工作主要集中在采样多程序基准上,但很少注意到完整地定义基准。在这项工作中,我们提出了一个四元组,以一种定义良好的方式正式定义多程序基准。然后,我们将研究通过改变这个元组的元素创建的四种不同的基准是如何与实际用例保持一致的。我们评估了这些变化对实际硬件的影响,并看到了由相同程序构建的不同基准测试之间结果的巨大差异。值得注意的差异包括显著的加速与减速(例如,+57% vs -5%或+26% vs -18%),以及即使结果在相同方向上(例如,67% vs 11%),幅度上的差异也很大。
Although definition of single-program benchmarks is relatively straight-forward-a benchmark is a program plus a specific input-definition of multi-program benchmarks is more complex. Each program may have a different runtime and they may have different interactions depending on how they align with each other. While prior work has focused on sampling multiprogram benchmarks, little attention has been paid to defining the benchmarks in their entirety. In this work, we propose a four-tuple that formally defines multi-program benchmarks in a well-defined way. We then examine how four different classes of benchmarks created by varying the elements of this tuple align with real-world use-cases. We evaluate the impact of these variations on real hardware, and see drastic variations in results between different benchmarks constructed from the same programs. Notable differences include significant speedups versus slowdowns (e.g., +57% vs -5% or +26% vs -18%), and large differences in magnitude even when the results are in the same direction (e.g., 67% versus 11%).