Displacement Mapping is an effective technique for encoding the high levels of detail found in today's triangle based surface models. Extending the hardware rendering pipeline to be capable of handling displacement maps as geometric primitives, will allow highly detailed models to be constructed without requiring large numbers of triangles to be passed from the CPU to the graphics pipeline. We present a new approach based on recursive tessellation that adapts to the surface complexity described by the displacement map. We also ensure that the resolution of the displaced mesh is tessellated with respect to the current view point. Our tessellation scheme performs all tests only on triangle edges to avoid generating cracks on the displaced surface. The main decision for vertex insertion is based on two comparisons involving the average height surrounding the vertices and the normals at the vertices. Individually, the tests will fail to tessellate a mesh satisfactorily, but their combination achieves good results. We propose several additions to the typical hardware rendering pipeline in order to achieve displacement map rendering in hardware. The mesh tessellation is placed within the rendering pipeline so that we can take advantage of the pre-existing vertex transformation units to perform the setup calculations for our view dependent test. Our method adds only simple arithmetic and comparison operations to the graphics pipeline and makes use of existing units for calculations wherever possible.
{"title":"Adaptive view dependent tessellation of displacement maps","authors":"M. Doggett, Johannes Hirche","doi":"10.1145/346876.348220","DOIUrl":"https://doi.org/10.1145/346876.348220","url":null,"abstract":"Displacement Mapping is an effective technique for encoding the high levels of detail found in today's triangle based surface models. Extending the hardware rendering pipeline to be capable of handling displacement maps as geometric primitives, will allow highly detailed models to be constructed without requiring large numbers of triangles to be passed from the CPU to the graphics pipeline. We present a new approach based on recursive tessellation that adapts to the surface complexity described by the displacement map. We also ensure that the resolution of the displaced mesh is tessellated with respect to the current view point. Our tessellation scheme performs all tests only on triangle edges to avoid generating cracks on the displaced surface. The main decision for vertex insertion is based on two comparisons involving the average height surrounding the vertices and the normals at the vertices. Individually, the tests will fail to tessellate a mesh satisfactorily, but their combination achieves good results. We propose several additions to the typical hardware rendering pipeline in order to achieve displacement map rendering in hardware. The mesh tessellation is placed within the rendering pipeline so that we can take advantage of the pre-existing vertex transformation units to perform the setup calculations for our view dependent test. Our method adds only simple arithmetic and comparison operations to the graphics pipeline and makes use of existing units for calculations wherever possible.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129848438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Well known implemetations for perspective correct rendering of planar polygons require a division per rendered pixel. Such a division is better to be avoided as it is an expensive operation in terms of silicon gates and clock cycles. In this paper we present a family of efficient midpoint algorithms that can be used to avoid division operators. These algorithms do not require more than a small number of additions per pixel. We show how these can be embedded in scan line algorithms and in algorithms that use mipmaps. Experiments with software implementations show that the division free algorithms are a factor of two faster, provided that the polygons are not too small. These algorithms are however most profitable when realised in hardware.
{"title":"Algorithms for division free perspective correct rendering","authors":"B. Barenbrug, F. Peters, C. V. Overveld","doi":"10.1145/346876.346878","DOIUrl":"https://doi.org/10.1145/346876.346878","url":null,"abstract":"Well known implemetations for perspective correct rendering of planar polygons require a division per rendered pixel. Such a division is better to be avoided as it is an expensive operation in terms of silicon gates and clock cycles. In this paper we present a family of efficient midpoint algorithms that can be used to avoid division operators. These algorithms do not require more than a small number of additions per pixel. We show how these can be embedded in scan line algorithms and in algorithms that use mipmaps. Experiments with software implementations show that the division free algorithms are a factor of two faster, provided that the polygons are not too small. These algorithms are however most profitable when realised in hardware.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"211 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134176045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing techniques for traversing a polygon generate fragments one (or more) rows or columns at a time. (A fragment is all the information needed to paint one pixel of the polygon.) This order is non-optimal for many operations. For example, most frame buffers are tiled into rectangular pages, and there is a cost associated with accessing a different page. Pixel processing is more efficient if all fragments of a polygon on one page are generated before any fragments on a different page. Similarly, texture caches have reduced miss rates if fragments are generated in tiles (and even tiles of tiles) whose size depends upon the cache organization. We describe a polygon traversal algorithm that generates fragments in a tiled fashion. That is, it generates all fragments of a polygon within a rectangle (tile) before generating any fragments in another rectangle. For a single level of tiling, our algorithm requires one additional saved context (the values of all interpolator accumulators, such as Z depth, Red, Green, Blue, etc.) over a traditional traversal algorithm based upon half-plane edge functions. An additional level of tiling requires another saved context for the special case of rectangle copies, or three more for the general case. We describe how to use this algorithm to generate fragments in an optimal order for several common scenarios.
{"title":"Tiled polygon traversal using half-plane edge functions","authors":"Joel McCormack, Bob McNamara","doi":"10.1145/346876.346882","DOIUrl":"https://doi.org/10.1145/346876.346882","url":null,"abstract":"Existing techniques for traversing a polygon generate fragments one (or more) rows or columns at a time. (A fragment is all the information needed to paint one pixel of the polygon.) This order is non-optimal for many operations. For example, most frame buffers are tiled into rectangular pages, and there is a cost associated with accessing a different page. Pixel processing is more efficient if all fragments of a polygon on one page are generated before any fragments on a different page. Similarly, texture caches have reduced miss rates if fragments are generated in tiles (and even tiles of tiles) whose size depends upon the cache organization. We describe a polygon traversal algorithm that generates fragments in a tiled fashion. That is, it generates all fragments of a polygon within a rectangle (tile) before generating any fragments in another rectangle. For a single level of tiling, our algorithm requires one additional saved context (the values of all interpolator accumulators, such as Z depth, Red, Green, Blue, etc.) over a traditional traversal algorithm based upon half-plane edge functions. An additional level of tiling requires another saved context for the special case of rectangle copies, or three more for the general case. We describe how to use this algorithm to generate fragments in an optimal order for several common scenarios.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115321713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present the RACE II Engine, which uses a hybrid volume rendering methodology that combines algorithmic and hardware acceleration to maximize ray casting performance relative the total amount of volume memory throughput contained in the system. The challenge for future volume rendering accelerators will be the ability to process higher resolution datasets at over 10Hz without utilizing large-scale, and therefore, expensive designs. The limiting performance factor for large datasets will be the throughput between the volume memory subsystem and computational units. Unfortunately, the throughput between memory devices and computational units does not scale with Moore's law. As a result, memory efficient solutions are needed that maximize the input-output relationship between volume memory throughput and frame rate. The RACE II design utilizes this approach and achieves an input-output relationship of up to 4 × larger than many solutions proposed in literature. As a result, this architecture is well suited for meeting the challenges of next generation datasets.
{"title":"The RACE II engine for real-time volume rendering","authors":"H. Ray, D. Silver","doi":"10.1145/346876.348248","DOIUrl":"https://doi.org/10.1145/346876.348248","url":null,"abstract":"In this paper, we present the RACE II Engine, which uses a hybrid volume rendering methodology that combines algorithmic and hardware acceleration to maximize ray casting performance relative the total amount of volume memory throughput contained in the system. The challenge for future volume rendering accelerators will be the ability to process higher resolution datasets at over 10Hz without utilizing large-scale, and therefore, expensive designs. The limiting performance factor for large datasets will be the throughput between the volume memory subsystem and computational units. Unfortunately, the throughput between memory devices and computational units does not scale with Moore's law. As a result, memory efficient solutions are needed that maximize the input-output relationship between volume memory throughput and frame rate. The RACE II design utilizes this approach and achieves an input-output relationship of up to 4 × larger than many solutions proposed in literature. As a result, this architecture is well suited for meeting the challenges of next generation datasets.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116403028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hardware-acceleration for geometric deformation is developed in the framework of an extension to the OpenGL specification. The method requires an addition to the front-end of the OpenGL rendering pipeline and an appropriate OpenGL primitive. Our approach is to implement general geometric deformations so the system supports additional layers of abstraction, including physically based simulations. This approach would support a wide range of users with an accelerated implementation of a well-understood deformation method, reducing the need for software deformation engines and the execution time penalty associated with them.
{"title":"Hardware-accelerated free-form deformation","authors":"C. Chua, U. Neumann","doi":"10.1145/346876.346884","DOIUrl":"https://doi.org/10.1145/346876.346884","url":null,"abstract":"Hardware-acceleration for geometric deformation is developed in the framework of an extension to the OpenGL specification. The method requires an addition to the front-end of the OpenGL rendering pipeline and an appropriate OpenGL primitive. Our approach is to implement general geometric deformations so the system supports additional layers of abstraction, including physically based simulations. This approach would support a wide range of users with an accelerated implementation of a well-understood deformation method, reducing the need for software deformation engines and the execution time penalty associated with them.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134189339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper a technique is presented that combines interactive hardware accelerated bump mapping with shift-variant anisotropic reflectance models. An evolutionary path is shown how some simpler reflectance models can be rendered at interactive rates on current low-end graphics hardware, and how features from future graphics hardware can be exploited for more complex models. We show how our method can be applied to some well known reflectance models, namely the Banks model, Ward's model, and an anisotropic version of the Blinn-Phong model, but it is not limited to these models. Furthermore, we take a close look at the necessary capabilities of the graphics hardware, identifiy problems with current hardware, and discuss possible enhancements.
{"title":"Towards interactive bump mapping with anisotropic shift-variant BRDFs","authors":"J. Kautz, H. Seidel","doi":"10.1145/346876.348214","DOIUrl":"https://doi.org/10.1145/346876.348214","url":null,"abstract":"In this paper a technique is presented that combines interactive hardware accelerated bump mapping with shift-variant anisotropic reflectance models. An evolutionary path is shown how some simpler reflectance models can be rendered at interactive rates on current low-end graphics hardware, and how features from future graphics hardware can be exploited for more complex models. We show how our method can be applied to some well known reflectance models, namely the Banks model, Ward's model, and an anisotropic version of the Blinn-Phong model, but it is not limited to these models. Furthermore, we take a close look at the necessary capabilities of the graphics hardware, identifiy problems with current hardware, and discuss possible enhancements.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"258 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116145498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Douglas Owens, W. Dally, U. Kapasi, S. Rixner, P. Mattson, B. Mowery
The use of a programmable stream architecture in polygon rendering provides a powerful mechanism to address the high performance needs of today's complex scenes as well as the need for flexibility and programmability in the polygon rendering pipeline. We describe how a polygon rendering pipeline maps into data streams and kernels that operate on streams, and how this mapping is used to implement the polgyon rendering pipeline on Imagine, a programmable stream processor. We compare our results on a cycle-accurate simulation of Imagine to representative hardware and software renderers.
{"title":"Polygon rendering on a stream architecture","authors":"John Douglas Owens, W. Dally, U. Kapasi, S. Rixner, P. Mattson, B. Mowery","doi":"10.1145/346876.346883","DOIUrl":"https://doi.org/10.1145/346876.346883","url":null,"abstract":"The use of a programmable stream architecture in polygon rendering provides a powerful mechanism to address the high performance needs of today's complex scenes as well as the need for flexibility and programmability in the polygon rendering pipeline. We describe how a polygon rendering pipeline maps into data streams and kernels that operate on streams, and how this mapping is used to implement the polgyon rendering pipeline on Imagine, a programmable stream processor. We compare our results on a cycle-accurate simulation of Imagine to representative hardware and software renderers.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127021389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel algorithm to evaluate and render Loop subdivision surfaces. The algorithm exploits the fact that Loop subdivision surfaces are piecewise polynomial and uses the forward difference technique for efficiently computing uniform samples on the limit surface. The main advantage of our algorithm is that it only requires a small and constant amount of memory that does not depend on the subdivision depth. The simple structure of the algorithm enables a scalable degree of hardware implementation. By low-level parallelization of the computations, we can reduce the critical computations costs to a theoretical minimum of about one float [3]-operation per triangle.
{"title":"Towards hardware implementation of loop subdivision","authors":"Stephan Bischoff, L. Kobbelt, H. Seidel","doi":"10.1145/346876.346886","DOIUrl":"https://doi.org/10.1145/346876.346886","url":null,"abstract":"We present a novel algorithm to evaluate and render Loop subdivision surfaces. The algorithm exploits the fact that Loop subdivision surfaces are piecewise polynomial and uses the forward difference technique for efficiently computing uniform samples on the limit surface. The main advantage of our algorithm is that it only requires a small and constant amount of memory that does not depend on the subdivision depth. The simple structure of the algorithm enables a scalable degree of hardware implementation. By low-level parallelization of the computations, we can reduce the critical computations costs to a theoretical minimum of about one float [3]-operation per triangle.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123720341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We investigate a new hybrid of sort-first and sort-last approach for parallel polygon rendering, using as a target platform a cluster of PCs. Unlike previous methods that statically partition the 3D model and/or the 2D image, our approach performs dynamic, view-dependent and coordinated partitioning of both the 3D model and the 2D image. Using a specific algorithm that follows this approach, we show that it performs better than previous approaches and scales better with both processor count and screen resolution. Overall, our algorithm is able to achieve interactive frame rates with efficiencies of 55.0% to 70.5% during simulations of a system with 64 PCs. While it does have potential disadvantages in client-side processing and in dynamic data management—which also stem from its dynamic, view-dependent nature—these problems are likely to diminish with technology trends in the future.
{"title":"Hybrid sort-first and sort-last parallel rendering with a cluster of PCs","authors":"Rudrajit Samanta, T. Funkhouser, Kai Li, J. Singh","doi":"10.1145/346876.348237","DOIUrl":"https://doi.org/10.1145/346876.348237","url":null,"abstract":"We investigate a new hybrid of sort-first and sort-last approach for parallel polygon rendering, using as a target platform a cluster of PCs. Unlike previous methods that statically partition the 3D model and/or the 2D image, our approach performs dynamic, view-dependent and coordinated partitioning of both the 3D model and the 2D image. Using a specific algorithm that follows this approach, we show that it performs better than previous approaches and scales better with both processor count and screen resolution. Overall, our algorithm is able to achieve interactive frame rates with efficiencies of 55.0% to 70.5% during simulations of a system with 64 PCs. While it does have potential disadvantages in client-side processing and in dynamic data management—which also stem from its dynamic, view-dependent nature—these problems are likely to diminish with technology trends in the future.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115977977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As networks get faster, it becomes more feasible to render large data sets remotely. For example, it is useful to run large scientific simulations on remote compute servers but visualize the results of those simulations on one or more local displays. The WireGL project at Stanford is researching new techniques for rendering over a network. For many applications, we can render remotely over a gigabit network to a tiled display with little or no performance loss over running locally. One of the elements of WireGL that makes this performance possible is our ability to track the graphics state of a running application. In this paper, we will describe our techniques for tracking state, as well as efficient algorithms for computing the difference between two graphics contexts. This fast differencing operation allows WireGL to transmit less state data over the network by updating server state lazily. It also allows our system to context switch between multiple graphics applications several million times per second without flushing the hardware accelerator. This results in substantial performance gains when sharing a remote display between multiple clients.
{"title":"Tracking graphics state for networked rendering","authors":"I. Buck, G. Humphreys, P. Hanrahan","doi":"10.1145/346876.348233","DOIUrl":"https://doi.org/10.1145/346876.348233","url":null,"abstract":"As networks get faster, it becomes more feasible to render large data sets remotely. For example, it is useful to run large scientific simulations on remote compute servers but visualize the results of those simulations on one or more local displays. The WireGL project at Stanford is researching new techniques for rendering over a network. For many applications, we can render remotely over a gigabit network to a tiled display with little or no performance loss over running locally. One of the elements of WireGL that makes this performance possible is our ability to track the graphics state of a running application. In this paper, we will describe our techniques for tracking state, as well as efficient algorithms for computing the difference between two graphics contexts. This fast differencing operation allows WireGL to transmit less state data over the network by updating server state lazily. It also allows our system to context switch between multiple graphics applications several million times per second without flushing the hardware accelerator. This results in substantial performance gains when sharing a remote display between multiple clients.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133520604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}