{"title":"并行网格生成中硬件事务内存的有效利用","authors":"Tetsu Kobayashi, Shigeyuki Sato, H. Iwasaki","doi":"10.1109/ICPP.2015.69","DOIUrl":null,"url":null,"abstract":"Efficient transactional executions are desirable for parallel implementations of algorithms with graph refinements. Hardware transactional memory (HTM) is promising for easy yet efficient transactional executions. Long HTM transactions, however, abort with high probability because of hardware limitations. Unfortunately, Delaunay mesh refinement (DMR), which is an algorithm with graph refinements for mesh generation, causes long transactions. Its parallel implementation naively based on HTM therefore leads to poor performance. To utilize HTM efficiently for parallel implementation of DMR, we present an approach to shortening transactions. Our HTM based implementations of DMR achieved significantly higher throughput and better scalability than a naive HTM-based one and lock-based ones. On a quad-core Has well processor, the absolute speedup of one of our implementations was up to 2.64 with 16 threads.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Use of Hardware Transactional Memory for Parallel Mesh Generation\",\"authors\":\"Tetsu Kobayashi, Shigeyuki Sato, H. Iwasaki\",\"doi\":\"10.1109/ICPP.2015.69\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient transactional executions are desirable for parallel implementations of algorithms with graph refinements. Hardware transactional memory (HTM) is promising for easy yet efficient transactional executions. Long HTM transactions, however, abort with high probability because of hardware limitations. Unfortunately, Delaunay mesh refinement (DMR), which is an algorithm with graph refinements for mesh generation, causes long transactions. Its parallel implementation naively based on HTM therefore leads to poor performance. To utilize HTM efficiently for parallel implementation of DMR, we present an approach to shortening transactions. Our HTM based implementations of DMR achieved significantly higher throughput and better scalability than a naive HTM-based one and lock-based ones. On a quad-core Has well processor, the absolute speedup of one of our implementations was up to 2.64 with 16 threads.\",\"PeriodicalId\":423007,\"journal\":{\"name\":\"2015 44th International Conference on Parallel Processing\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 44th International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2015.69\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 44th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2015.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Use of Hardware Transactional Memory for Parallel Mesh Generation
Efficient transactional executions are desirable for parallel implementations of algorithms with graph refinements. Hardware transactional memory (HTM) is promising for easy yet efficient transactional executions. Long HTM transactions, however, abort with high probability because of hardware limitations. Unfortunately, Delaunay mesh refinement (DMR), which is an algorithm with graph refinements for mesh generation, causes long transactions. Its parallel implementation naively based on HTM therefore leads to poor performance. To utilize HTM efficiently for parallel implementation of DMR, we present an approach to shortening transactions. Our HTM based implementations of DMR achieved significantly higher throughput and better scalability than a naive HTM-based one and lock-based ones. On a quad-core Has well processor, the absolute speedup of one of our implementations was up to 2.64 with 16 threads.