J. Hart, N. Carr, Masaki Karneya, Stephen A. Tibbitts, T. J. Coleman
Procedural solid texturing was introduced fourteen years ago, but has yet to find its way into consumer level graphics hardware for real-time operation. To this end, a new model is introduced that yields a parameterized function capable of synthesizing the most common procedural solid textures, specifically wood, marble, clouds and fire. This model is simple enough to be implemented in hardware, and can be realized in VLSI with as little as 100,000 gates. The new model also yields a new method for antialiasing synthesized textures. An expression for the necessary box filter width is derived as a function of the texturing parameters, the texture coordinates and the rasterization variables. Given this filter width, a technique for efficiently box filtering the synthesized texture by either mip mapping the color table or using a summed area color table are presented. Examples of the antialiased results are shown.
{"title":"Antialiased parameterized solid texturing simplified for consumer-level hardware implementation","authors":"J. Hart, N. Carr, Masaki Karneya, Stephen A. Tibbitts, T. J. Coleman","doi":"10.1145/311534.311575","DOIUrl":"https://doi.org/10.1145/311534.311575","url":null,"abstract":"Procedural solid texturing was introduced fourteen years ago, but has yet to find its way into consumer level graphics hardware for real-time operation. To this end, a new model is introduced that yields a parameterized function capable of synthesizing the most common procedural solid textures, specifically wood, marble, clouds and fire. This model is simple enough to be implemented in hardware, and can be realized in VLSI with as little as 100,000 gates. The new model also yields a new method for antialiasing synthesized textures. An expression for the necessary box filter width is derived as a function of the texturing parameters, the texture coordinates and the rasterization variables. Given this filter width, a technique for efficiently box filtering the synthesized texture by either mip mapping the color table or using a summed area color table are presented. Examples of the antialiased results are shown.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122922451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present an algorithm for low-cost hardware antialiasing and transparency. This technique keeps a central Z value along with 8-bit floating-point Z gradients in the X and Y dimensions for each fragment within a pixel (hence the name Z 3). It uses a small fixed amount of storage per pixel. If there are more fragments generated for a pixel than the space available, it merges only as many fragments as necessary in order to fit in the available per-pixel memory. The merging occurs on those fragments having the closest Z values. This combines different fragments from the same surface, resulting in both storage and processing efficiency. When operating with opaque surfaces, Z 3 can provide superior image quality over sparse supersampling methods that use eight samples per pixel while using storage for only three fragments. Z 3 also makes the use of large numbers of samples (e.g., 16) feasible in inexpensive hardware, enabling higher quality images. It is simple to implement because it uses a small fixed number of fragments per pixel. Z can also provide order-independent transparency even if many transparent surfaces are present. Moreover, unlike the original A-buffer algorithm it correctly antialiases interpenetrating transparent surfaces because it has three-dimensional Z information within each pixel.
{"title":"Z3: an economical hardware technique for high-quality antialiasing and transparency","authors":"N. Jouppi, Chun-Fa Chang","doi":"10.1145/311534.311582","DOIUrl":"https://doi.org/10.1145/311534.311582","url":null,"abstract":"In this paper we present an algorithm for low-cost hardware antialiasing and transparency. This technique keeps a central Z value along with 8-bit floating-point Z gradients in the X and Y dimensions for each fragment within a pixel (hence the name Z 3). It uses a small fixed amount of storage per pixel. If there are more fragments generated for a pixel than the space available, it merges only as many fragments as necessary in order to fit in the available per-pixel memory. The merging occurs on those fragments having the closest Z values. This combines different fragments from the same surface, resulting in both storage and processing efficiency. When operating with opaque surfaces, Z 3 can provide superior image quality over sparse supersampling methods that use eight samples per pixel while using storage for only three fragments. Z 3 also makes the use of large numbers of samples (e.g., 16) feasible in inexpensive hardware, enabling higher quality images. It is simple to implement because it uses a small fixed number of fragments per pixel. Z can also provide order-independent transparency even if many transparent surfaces are present. Moreover, unlike the original A-buffer algorithm it correctly antialiases interpenetrating transparent surfaces because it has three-dimensional Z information within each pixel.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115464065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects. CR Categories: 1.3.1 [Computer Graphics]: Hardware Architecture-Graphics Processors; 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-Color, shading, shadowing, and texture.
{"title":"Texture shaders","authors":"M. McCool, W. Heidrich","doi":"10.1145/311534.311585","DOIUrl":"https://doi.org/10.1145/311534.311585","url":null,"abstract":"Extensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects. CR Categories: 1.3.1 [Computer Graphics]: Hardware Architecture-Graphics Processors; 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-Color, shading, shadowing, and texture.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128293586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rudrajit Samanta, Jiannan Zheng, T. Funkhouser, Kai Li, J. Singh
Multi-projector systems are increasingly being used to provide large-scale and high-resolution displays for next-generation interactive 3D graphics applications, including large-scale data visualization, immersive virtual environments, and collaborative design. These systems must include a very high-performance and scalable 3D rendering subsystem in order to generate high-resolution images at real-time frame rates. This paper describes a sort-first parallel rendering system for a scalable display wall system built with a network of PCs, graphics accelerators, and portable projectors. The main challenge is to develop scalable algorithms to partition and assign rendering tasks effectively under the performance and functionality constraints of system area networks, PCs, and commodity 3-D graphics accelerators. We have developed three coarse-grained partitioning algorithms, incorporated them into a working prototype system, and run initial experiments aimed at evaluating algorithmic trade-offs and performance bottlenecks in such a system. Results of our experiments indicate that the coarse-grained characteristics of the sort-first architecture are well suited for constructing a parallel rendering system running on a PC cluster.
{"title":"Load balancing for multi-projector rendering systems","authors":"Rudrajit Samanta, Jiannan Zheng, T. Funkhouser, Kai Li, J. Singh","doi":"10.1145/311534.311584","DOIUrl":"https://doi.org/10.1145/311534.311584","url":null,"abstract":"Multi-projector systems are increasingly being used to provide large-scale and high-resolution displays for next-generation interactive 3D graphics applications, including large-scale data visualization, immersive virtual environments, and collaborative design. These systems must include a very high-performance and scalable 3D rendering subsystem in order to generate high-resolution images at real-time frame rates. This paper describes a sort-first parallel rendering system for a scalable display wall system built with a network of PCs, graphics accelerators, and portable projectors. The main challenge is to develop scalable algorithms to partition and assign rendering tasks effectively under the performance and functionality constraints of system area networks, PCs, and commodity 3-D graphics accelerators. We have developed three coarse-grained partitioning algorithms, incorporated them into a working prototype system, and run initial experiments aimed at evaluating algorithmic trade-offs and performance bottlenecks in such a system. Results of our experiments indicate that the coarse-grained characteristics of the sort-first architecture are well suited for constructing a parallel rendering system running on a PC cluster.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126408405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.
{"title":"Parallel texture caching","authors":"Homan Igehy, Matthew Eldridge, P. Hanrahan","doi":"10.1145/311534.311583","DOIUrl":"https://doi.org/10.1145/311534.311583","url":null,"abstract":"The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128210025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a method for occlusion culling in a tiled 3D graphics hardware architecture. Adaptive hierarchical visibility (AHV) is a simplified method for occlusion culling that is integrated into a tiled architecture for hardware rendering. AI-IV constructs a list of polygon bins for each tile where the bins are bucket sorted in order of increasing depth or Z. Polygon bins are rendered starting with the bin closest to the viewer. After some number of bins are rendered, a one layer, hierarchical Zbuffer (HZ) is constructed from the Z-buffer thus far accumulated for the rendered bins. Subsequent bins are rendered by first testing their polygons against the HZ to see if they are hidden. AHV is far simpler to implement in hardware and gives performance that matches or surpasses progressive hierarchical visibility (PHV) methods which update the HZ for each rendered pixel. Results show that AI-IV is superior on scenes with high depth complexity and small polygons. For tiles of widely ranging statistics, AHV competes surprisingly well with PHV. It offers dramatic performance improvement on low cost hardware for scenes of high depth complexity.
{"title":"Adaptive hierarchical visibility in a tiled architecture","authors":"Feng Xie, M. Shantz","doi":"10.1145/311534.311581","DOIUrl":"https://doi.org/10.1145/311534.311581","url":null,"abstract":"This paper describes a method for occlusion culling in a tiled 3D graphics hardware architecture. Adaptive hierarchical visibility (AHV) is a simplified method for occlusion culling that is integrated into a tiled architecture for hardware rendering. AI-IV constructs a list of polygon bins for each tile where the bins are bucket sorted in order of increasing depth or Z. Polygon bins are rendered starting with the bin closest to the viewer. After some number of bins are rendered, a one layer, hierarchical Zbuffer (HZ) is constructed from the Z-buffer thus far accumulated for the rendered bins. Subsequent bins are rendered by first testing their polygons against the HZ to see if they are hidden. AHV is far simpler to implement in hardware and gives performance that matches or surpasses progressive hierarchical visibility (PHV) methods which update the HZ for each rendered pixel. Results show that AI-IV is superior on scenes with high depth complexity and small polygons. For tiles of widely ranging statistics, AHV competes surprisingly well with PHV. It offers dramatic performance improvement on low cost hardware for scenes of high depth complexity.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114626490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joel McCormack, Bob McNamara, C. Gianos, L. Seiler, N. Jouppi, Kenneth W. Correll
High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are often scalable: they can increase performance by using more chips. Scalability has obvious costs: a minimal configuration needs several chips, and some configurations must replicate texture maps. A less obvious cost is the almost irresistible temptation to replicate chips to increase performance, rather than to design individual chips for higher performance in the first place. In contrast, Neon is a single chip that performs like a multichip design. Neon accelerates OpenGL [19] 3D rendering, as well as X11 [20] and Windows/NT 2D rendering. Since our pin budget limited peak memory bandwidth, we designed Neon from the memory system upward in order to reduce bandwidth requirements. Neon has no special-purpose memories; its eight independent 32-bit memory controllers can access color buffers, Z depth buffers, stencil buffers, and texture data. To fit our gate budget, we shared logic among different operations with similar implementation requirements, and left floating point calculations to Digital's Alpha CPUs. Neon's performance is between HP's Visualize fx4 and fx6, and is well above SGI''s MXE for most operations. Neon-based boards cost much less than these competitors, due to a small part count and use of commodity SDRAMs.
{"title":"Neon: a single-chip 3D workstation graphics accelerator","authors":"Joel McCormack, Bob McNamara, C. Gianos, L. Seiler, N. Jouppi, Kenneth W. Correll","doi":"10.1145/285305.285320","DOIUrl":"https://doi.org/10.1145/285305.285320","url":null,"abstract":"High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are often scalable: they can increase performance by using more chips. Scalability has obvious costs: a minimal configuration needs several chips, and some configurations must replicate texture maps. A less obvious cost is the almost irresistible temptation to replicate chips to increase performance, rather than to design individual chips for higher performance in the first place. In contrast, Neon is a single chip that performs like a multichip design. Neon accelerates OpenGL [19] 3D rendering, as well as X11 [20] and Windows/NT 2D rendering. Since our pin budget limited peak memory bandwidth, we designed Neon from the memory system upward in order to reduce bandwidth requirements. Neon has no special-purpose memories; its eight independent 32-bit memory controllers can access color buffers, Z depth buffers, stencil buffers, and texture data. To fit our gate budget, we shared logic among different operations with similar implementation requirements, and left floating point calculations to Digital's Alpha CPUs. Neon's performance is between HP's Visualize fx4 and fx6, and is well above SGI''s MXE for most operations. Neon-based boards cost much less than these competitors, due to a small part count and use of commodity SDRAMs.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115352430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A multiple-port, distributed frame buffer has been recently proposed to support parallel rendering on multicomputers. This paper describes an implementation of such a distributed frame buffer for the Intel Paragon routing network, and reports its performance results. We have conducted several experiments with the system we have developed. Our results indicate that placing a multipleport, distributed frame buffer directly on the host internal routing network can provide high throughput to eliminate the bottleneck of merging a final image from multiple processors to a frame buffer. This architectural approach can also effectively support image composition for sort-last. The synchronization algorithm we have developed requires only one-way communication and minimizes receive overhead for message passing to the frame buffer. CR Categories: B.4.3 [Input/Output]: Subsystems-Parallel I/O; 1.3.1 [Computer Graphics]: Hardware Architecture-Parallel Processing; C.4 [Performance of Systems]: Design Studies.
{"title":"Performance issues of a distributed frame buffer on a multicomputer","authors":"Bin Wei, D. Clark, E. Felten, Kai Li","doi":"10.1145/285305.285316","DOIUrl":"https://doi.org/10.1145/285305.285316","url":null,"abstract":"A multiple-port, distributed frame buffer has been recently proposed to support parallel rendering on multicomputers. This paper describes an implementation of such a distributed frame buffer for the Intel Paragon routing network, and reports its performance results. We have conducted several experiments with the system we have developed. Our results indicate that placing a multipleport, distributed frame buffer directly on the host internal routing network can provide high throughput to eliminate the bottleneck of merging a final image from multiple processors to a frame buffer. This architectural approach can also effectively support image composition for sort-last. The synchronization algorithm we have developed requires only one-way communication and minimizes receive overhead for message passing to the frame buffer. CR Categories: B.4.3 [Input/Output]: Subsystems-Parallel I/O; 1.3.1 [Computer Graphics]: Hardware Architecture-Parallel Processing; C.4 [Performance of Systems]: Design Studies.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133481254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bezier Triangles As Drawing Primitives J. Bruijns, Philips Research Laboratories’ We propose to use quadratic Bezier triangles as additional drawing primitives: quadratic Bezier triangles require much less model data for faithful representation of curved surfaces than planar triangles. Therefore, they require less storage and/or transmission capacity. Furthermore, they allow automatic level-of-detail. Finally, they result in considerable savings in model-view transformations and lighting calculations. We present two algorithms for rendering these triangles, each of which can be easily incorporated in hardware render systems currently used for planar triangles. CR
J. Bruijns, Philips研究实验室“我们建议使用二次贝塞尔三角形作为额外的绘图基元:二次贝塞尔三角形比平面三角形需要更少的模型数据来忠实地表示曲面。因此,它们需要较少的存储和/或传输容量。此外,它们还允许自动划分细节级别。最后,它们在模型视图转换和光照计算方面节省了大量的时间。我们提出了两种用于渲染这些三角形的算法,每一种算法都可以很容易地结合到当前用于平面三角形的硬件渲染系统中。CR
{"title":"Quadratic Bezier triangles as drawing primitives","authors":"J. Bruijns","doi":"10.1145/285305.285307","DOIUrl":"https://doi.org/10.1145/285305.285307","url":null,"abstract":"Bezier Triangles As Drawing Primitives J. Bruijns, Philips Research Laboratories’ We propose to use quadratic Bezier triangles as additional drawing primitives: quadratic Bezier triangles require much less model data for faithful representation of curved surfaces than planar triangles. Therefore, they require less storage and/or transmission capacity. Furthermore, they allow automatic level-of-detail. Finally, they result in considerable savings in model-view transformations and lighting calculations. We present two algorithms for rendering these triangles, each of which can be easily incorporated in hardware render systems currently used for planar triangles. CR","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126894558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Milton Chen, Gordon Stoll, Homan Igehy, Kekoa Proudfoot, P. Hanrahan
Bucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benefits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions. Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced—that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently. CR
{"title":"Simple models of the impact of overlap in bucket rendering","authors":"Milton Chen, Gordon Stoll, Homan Igehy, Kekoa Proudfoot, P. Hanrahan","doi":"10.1145/285305.285318","DOIUrl":"https://doi.org/10.1145/285305.285318","url":null,"abstract":"Bucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benefits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions. Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced—that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently. CR","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131422007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}