首页 > 最新文献

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware最新文献

英文 中文
Antialiased parameterized solid texturing simplified for consumer-level hardware implementation 为消费者级硬件实现简化了反锯齿参数化实体纹理
Pub Date : 1999-07-01 DOI: 10.1145/311534.311575
J. Hart, N. Carr, Masaki Karneya, Stephen A. Tibbitts, T. J. Coleman
Procedural solid texturing was introduced fourteen years ago, but has yet to find its way into consumer level graphics hardware for real-time operation. To this end, a new model is introduced that yields a parameterized function capable of synthesizing the most common procedural solid textures, specifically wood, marble, clouds and fire. This model is simple enough to be implemented in hardware, and can be realized in VLSI with as little as 100,000 gates. The new model also yields a new method for antialiasing synthesized textures. An expression for the necessary box filter width is derived as a function of the texturing parameters, the texture coordinates and the rasterization variables. Given this filter width, a technique for efficiently box filtering the synthesized texture by either mip mapping the color table or using a summed area color table are presented. Examples of the antialiased results are shown.
程序实体纹理是14年前引入的,但尚未找到用于实时操作的消费级图形硬件的方法。为此,引入了一个新的模型,该模型产生了一个参数化函数,能够综合最常见的程序实体纹理,特别是木材,大理石,云和火。该模型足够简单,可以在硬件中实现,并且可以在只有100,000个门的VLSI中实现。该模型还提供了一种新的抗混叠合成纹理的方法。根据纹理参数、纹理坐标和栅格化变量,导出了必要的框滤波器宽度表达式。在此滤波器宽度下,提出了一种通过mip映射颜色表或使用求和面积颜色表对合成纹理进行有效框过滤的技术。给出了抗锯齿结果的实例。
{"title":"Antialiased parameterized solid texturing simplified for consumer-level hardware implementation","authors":"J. Hart, N. Carr, Masaki Karneya, Stephen A. Tibbitts, T. J. Coleman","doi":"10.1145/311534.311575","DOIUrl":"https://doi.org/10.1145/311534.311575","url":null,"abstract":"Procedural solid texturing was introduced fourteen years ago, but has yet to find its way into consumer level graphics hardware for real-time operation. To this end, a new model is introduced that yields a parameterized function capable of synthesizing the most common procedural solid textures, specifically wood, marble, clouds and fire. This model is simple enough to be implemented in hardware, and can be realized in VLSI with as little as 100,000 gates. The new model also yields a new method for antialiasing synthesized textures. An expression for the necessary box filter width is derived as a function of the texturing parameters, the texture coordinates and the rasterization variables. Given this filter width, a technique for efficiently box filtering the synthesized texture by either mip mapping the color table or using a summed area color table are presented. Examples of the antialiased results are shown.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122922451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Z3: an economical hardware technique for high-quality antialiasing and transparency Z3:一种经济的高质量抗混叠和透明度的硬件技术
Pub Date : 1999-07-01 DOI: 10.1145/311534.311582
N. Jouppi, Chun-Fa Chang
In this paper we present an algorithm for low-cost hardware antialiasing and transparency. This technique keeps a central Z value along with 8-bit floating-point Z gradients in the X and Y dimensions for each fragment within a pixel (hence the name Z 3). It uses a small fixed amount of storage per pixel. If there are more fragments generated for a pixel than the space available, it merges only as many fragments as necessary in order to fit in the available per-pixel memory. The merging occurs on those fragments having the closest Z values. This combines different fragments from the same surface, resulting in both storage and processing efficiency. When operating with opaque surfaces, Z 3 can provide superior image quality over sparse supersampling methods that use eight samples per pixel while using storage for only three fragments. Z 3 also makes the use of large numbers of samples (e.g., 16) feasible in inexpensive hardware, enabling higher quality images. It is simple to implement because it uses a small fixed number of fragments per pixel. Z can also provide order-independent transparency even if many transparent surfaces are present. Moreover, unlike the original A-buffer algorithm it correctly antialiases interpenetrating transparent surfaces because it has three-dimensional Z information within each pixel.
本文提出了一种低成本的硬件抗混叠和透明算法。这种技术在一个像素内的每个片段(因此称为z3)的X和Y维度上保持一个中心Z值以及8位浮点Z梯度。它使用每个像素的少量固定存储。如果为一个像素生成的片段多于可用空间,则它只合并必要的片段,以适应可用的每像素内存。合并发生在Z值最接近的片段上。这结合了来自同一表面的不同碎片,从而提高了存储和处理效率。当使用不透明表面操作时,z3可以提供优于稀疏超采样方法的图像质量,稀疏超采样方法每像素使用8个样本,而仅使用三个片段的存储。z3还可以在便宜的硬件上使用大量样本(例如,16),从而实现更高质量的图像。它很容易实现,因为它使用每个像素固定数量的片段。即使存在许多透明表面,Z也可以提供与顺序无关的透明度。此外,与原始的A-buffer算法不同,它正确地抗锯齿互穿透明表面,因为它在每个像素内具有三维Z信息。
{"title":"Z3: an economical hardware technique for high-quality antialiasing and transparency","authors":"N. Jouppi, Chun-Fa Chang","doi":"10.1145/311534.311582","DOIUrl":"https://doi.org/10.1145/311534.311582","url":null,"abstract":"In this paper we present an algorithm for low-cost hardware antialiasing and transparency. This technique keeps a central Z value along with 8-bit floating-point Z gradients in the X and Y dimensions for each fragment within a pixel (hence the name Z 3). It uses a small fixed amount of storage per pixel. If there are more fragments generated for a pixel than the space available, it merges only as many fragments as necessary in order to fit in the available per-pixel memory. The merging occurs on those fragments having the closest Z values. This combines different fragments from the same surface, resulting in both storage and processing efficiency. When operating with opaque surfaces, Z 3 can provide superior image quality over sparse supersampling methods that use eight samples per pixel while using storage for only three fragments. Z 3 also makes the use of large numbers of samples (e.g., 16) feasible in inexpensive hardware, enabling higher quality images. It is simple to implement because it uses a small fixed number of fragments per pixel. Z can also provide order-independent transparency even if many transparent surfaces are present. Moreover, unlike the original A-buffer algorithm it correctly antialiases interpenetrating transparent surfaces because it has three-dimensional Z information within each pixel.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115464065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
Texture shaders 纹理阴影
Pub Date : 1999-07-01 DOI: 10.1145/311534.311585
M. McCool, W. Heidrich
Extensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects. CR Categories: 1.3.1 [Computer Graphics]: Hardware Architecture-Graphics Processors; 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-Color, shading, shadowing, and texture.
提出了对抽象图形硬件管道和OpenGL API的纹理映射支持的扩展,以便在各种未来图形加速器架构上使用统一的接口更好地支持可编程着色。我们的主要建议包括更好地支持纹理贴图坐标生成和一个抽象的、可编程的多纹理模型。作为动机,我们研究了几种针对重要视觉现象的交互式渲染算法。有了可编程多纹理支持的硬件实现,这些目前需要多个通道的效果的实现可以在一个通道中渲染。我们所建议的扩展的通用性使各种其他交互式呈现算法的有效实现成为可能。我们的API提案的中间抽象层支持高级着色器元编程工具包和相对简单的实现,同时隐藏了当前将OpenGL分割成不兼容方言的多纹理支持的细节。CR分类:1.3.1[计算机图形学]:硬件架构-图形处理器;1.3.7[计算机图形学]:三维图形和现实主义-颜色,阴影,阴影和纹理。
{"title":"Texture shaders","authors":"M. McCool, W. Heidrich","doi":"10.1145/311534.311585","DOIUrl":"https://doi.org/10.1145/311534.311585","url":null,"abstract":"Extensions to the texture-mapping support of the abstract graphics hardware pipeline and the OpenGL API are proposed to better support programmable shading, with a unified interface, on a variety of future graphics accelerator architectures. Our main proposals include better support for texture map coordinate generation and an abstract, programmable model for multitexturing. As motivation, we survey several interactive rendering algorithms that target important visual phenomena. With hardware implementation of programmable multitexturing support, implementations of these effects that currently take multiple passes can be rendered in one pass. The generality of our proposed extensions enable efficient implementation of a wide range of other interactive rendering algorithms. The intermediate level of abstraction of our API proposal enables high-level shader metaprogramming toolkits and relatively straightforward implementations, while hiding the details of multitexturing support that are currently fragmenting OpenGL into incompatible dialects. CR Categories: 1.3.1 [Computer Graphics]: Hardware Architecture-Graphics Processors; 1.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism-Color, shading, shadowing, and texture.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128293586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Load balancing for multi-projector rendering systems 多投影机渲染系统的负载平衡
Pub Date : 1999-07-01 DOI: 10.1145/311534.311584
Rudrajit Samanta, Jiannan Zheng, T. Funkhouser, Kai Li, J. Singh
Multi-projector systems are increasingly being used to provide large-scale and high-resolution displays for next-generation interactive 3D graphics applications, including large-scale data visualization, immersive virtual environments, and collaborative design. These systems must include a very high-performance and scalable 3D rendering subsystem in order to generate high-resolution images at real-time frame rates. This paper describes a sort-first parallel rendering system for a scalable display wall system built with a network of PCs, graphics accelerators, and portable projectors. The main challenge is to develop scalable algorithms to partition and assign rendering tasks effectively under the performance and functionality constraints of system area networks, PCs, and commodity 3-D graphics accelerators. We have developed three coarse-grained partitioning algorithms, incorporated them into a working prototype system, and run initial experiments aimed at evaluating algorithmic trade-offs and performance bottlenecks in such a system. Results of our experiments indicate that the coarse-grained characteristics of the sort-first architecture are well suited for constructing a parallel rendering system running on a PC cluster.
多投影仪系统越来越多地被用于为下一代交互式3D图形应用提供大规模和高分辨率显示,包括大规模数据可视化、沉浸式虚拟环境和协作设计。这些系统必须包含一个非常高性能和可扩展的3D渲染子系统,以便以实时帧速率生成高分辨率图像。本文描述了一种排序优先的并行渲染系统,用于可扩展的显示墙系统,该系统由pc机、图形加速器和便携式投影仪组成。主要的挑战是开发可扩展的算法,以便在系统区域网络、pc和商品3d图形加速器的性能和功能限制下有效地划分和分配渲染任务。我们已经开发了三种粗粒度划分算法,将它们合并到一个工作原型系统中,并运行旨在评估这种系统中的算法权衡和性能瓶颈的初始实验。实验结果表明,排序优先架构的粗粒度特性非常适合构建在PC集群上运行的并行渲染系统。
{"title":"Load balancing for multi-projector rendering systems","authors":"Rudrajit Samanta, Jiannan Zheng, T. Funkhouser, Kai Li, J. Singh","doi":"10.1145/311534.311584","DOIUrl":"https://doi.org/10.1145/311534.311584","url":null,"abstract":"Multi-projector systems are increasingly being used to provide large-scale and high-resolution displays for next-generation interactive 3D graphics applications, including large-scale data visualization, immersive virtual environments, and collaborative design. These systems must include a very high-performance and scalable 3D rendering subsystem in order to generate high-resolution images at real-time frame rates. This paper describes a sort-first parallel rendering system for a scalable display wall system built with a network of PCs, graphics accelerators, and portable projectors. The main challenge is to develop scalable algorithms to partition and assign rendering tasks effectively under the performance and functionality constraints of system area networks, PCs, and commodity 3-D graphics accelerators. We have developed three coarse-grained partitioning algorithms, incorporated them into a working prototype system, and run initial experiments aimed at evaluating algorithmic trade-offs and performance bottlenecks in such a system. Results of our experiments indicate that the coarse-grained characteristics of the sort-first architecture are well suited for constructing a parallel rendering system running on a PC cluster.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126408405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Parallel texture caching 并行纹理缓存
Pub Date : 1999-07-01 DOI: 10.1145/311534.311583
Homan Igehy, Matthew Eldridge, P. Hanrahan
The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.
高质量图像的创建需要实时图形架构中的新功能和更高的性能。就功能而言,纹理映射已经成为图形系统不可或缺的组成部分,而就性能而言,并行技术被用于图形管道的所有阶段。在光栅化中,纹理缓存已经成为减少纹理带宽需求的普遍方法。然而,并行光栅化架构在多个功能单元之间划分工作,从而潜在地降低了纹理引用的局部性。为了使这种架构具有良好的可扩展性,有必要开发高效的并行纹理缓存子系统。我们量化了并行栅格化对许多栅格化架构的纹理局部性的影响,代表了当前的商业产品和提出的未来架构。对光栅化系统的周期精确模拟表明,这些系统获得了并行加速,并指出了由于冗余工作、固有的并行负载不平衡、内存带宽不足和资源争用而导致的效率低下。我们发现并行纹理缓存工作得很好,并且足以与各种栅格化架构一起工作。
{"title":"Parallel texture caching","authors":"Homan Igehy, Matthew Eldridge, P. Hanrahan","doi":"10.1145/311534.311583","DOIUrl":"https://doi.org/10.1145/311534.311583","url":null,"abstract":"The creation of high-quality images requires new functionality and higher performance in real-time graphics architectures. In terms of functionality, texture mapping has become an integral component of graphics systems, and in terms of performance, parallel techniques are used at all stages of the graphics pipeline. In rasterization, texture caching has become prevalent for reducing texture bandwidth requirements. However, parallel rasterization architectures divide work across multiple functional units, thus potentially decreasing the locality of texture references. For such architectures to scale well, it is necessary to develop efficient parallel texture caching subsystems. We quantify the effects of parallel rasterization on texture locality for a number of rasterization architectures, representing both current commercial products and proposed future architectures. A cycle-accurate simulation of the rasterization system demonstrates the parallel speedup obtained by these systems and quantities inefficiencies due to redundant work, inherent parallel load imbalance, insufftcient memory bandwidth, and resource contention. We find that parallel texture caching works well, and is general enough to work with a wide variety of rasterization architectures.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128210025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Adaptive hierarchical visibility in a tiled architecture 平铺结构中的自适应分层可见性
Pub Date : 1999-07-01 DOI: 10.1145/311534.311581
Feng Xie, M. Shantz
This paper describes a method for occlusion culling in a tiled 3D graphics hardware architecture. Adaptive hierarchical visibility (AHV) is a simplified method for occlusion culling that is integrated into a tiled architecture for hardware rendering. AI-IV constructs a list of polygon bins for each tile where the bins are bucket sorted in order of increasing depth or Z. Polygon bins are rendered starting with the bin closest to the viewer. After some number of bins are rendered, a one layer, hierarchical Zbuffer (HZ) is constructed from the Z-buffer thus far accumulated for the rendered bins. Subsequent bins are rendered by first testing their polygons against the HZ to see if they are hidden. AHV is far simpler to implement in hardware and gives performance that matches or surpasses progressive hierarchical visibility (PHV) methods which update the HZ for each rendered pixel. Results show that AI-IV is superior on scenes with high depth complexity and small polygons. For tiles of widely ranging statistics, AHV competes surprisingly well with PHV. It offers dramatic performance improvement on low cost hardware for scenes of high depth complexity.
本文介绍了一种在平铺三维图形硬件结构中进行遮挡剔除的方法。自适应分层可见性(AHV)是一种简化的遮挡剔除方法,集成到硬件渲染的平铺架构中。AI-IV为每个贴图构建了一个多边形容器列表,其中容器按照深度或z的顺序进行分类。多边形容器从最靠近观察者的容器开始渲染。在渲染了一定数量的箱子之后,从迄今为止为渲染的箱子积累的z缓冲区构建一个单层的、分层的Zbuffer (HZ)。随后的箱子是通过首先测试它们的多边形来呈现的,看看它们是否被隐藏了。AHV在硬件上的实现要简单得多,并且提供的性能匹配或超过渐进式分层可见性(PHV)方法,后者更新每个渲染像素的HZ。结果表明,AI-IV在深度复杂度高、多边形小的场景中表现优异。对于范围广泛的统计数据,AHV与PHV的竞争令人惊讶。它在低成本硬件上为高深度复杂性场景提供了显著的性能改进。
{"title":"Adaptive hierarchical visibility in a tiled architecture","authors":"Feng Xie, M. Shantz","doi":"10.1145/311534.311581","DOIUrl":"https://doi.org/10.1145/311534.311581","url":null,"abstract":"This paper describes a method for occlusion culling in a tiled 3D graphics hardware architecture. Adaptive hierarchical visibility (AHV) is a simplified method for occlusion culling that is integrated into a tiled architecture for hardware rendering. AI-IV constructs a list of polygon bins for each tile where the bins are bucket sorted in order of increasing depth or Z. Polygon bins are rendered starting with the bin closest to the viewer. After some number of bins are rendered, a one layer, hierarchical Zbuffer (HZ) is constructed from the Z-buffer thus far accumulated for the rendered bins. Subsequent bins are rendered by first testing their polygons against the HZ to see if they are hidden. AHV is far simpler to implement in hardware and gives performance that matches or surpasses progressive hierarchical visibility (PHV) methods which update the HZ for each rendered pixel. Results show that AI-IV is superior on scenes with high depth complexity and small polygons. For tiles of widely ranging statistics, AHV competes surprisingly well with PHV. It offers dramatic performance improvement on low cost hardware for scenes of high depth complexity.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114626490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Neon: a single-chip 3D workstation graphics accelerator Neon:单芯片3D工作站图形加速器
Pub Date : 1998-08-01 DOI: 10.1145/285305.285320
Joel McCormack, Bob McNamara, C. Gianos, L. Seiler, N. Jouppi, Kenneth W. Correll
High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are often scalable: they can increase performance by using more chips. Scalability has obvious costs: a minimal configuration needs several chips, and some configurations must replicate texture maps. A less obvious cost is the almost irresistible temptation to replicate chips to increase performance, rather than to design individual chips for higher performance in the first place. In contrast, Neon is a single chip that performs like a multichip design. Neon accelerates OpenGL [19] 3D rendering, as well as X11 [20] and Windows/NT 2D rendering. Since our pin budget limited peak memory bandwidth, we designed Neon from the memory system upward in order to reduce bandwidth requirements. Neon has no special-purpose memories; its eight independent 32-bit memory controllers can access color buffers, Z depth buffers, stencil buffers, and texture data. To fit our gate budget, we shared logic among different operations with similar implementation requirements, and left floating point calculations to Digital's Alpha CPUs. Neon's performance is between HP's Visualize fx4 and fx6, and is well above SGI''s MXE for most operations. Neon-based boards cost much less than these competitors, due to a small part count and use of commodity SDRAMs.
传统上,高性能3D图形加速器需要在多个电路板上安装多个芯片,包括几何、光栅化、像素处理和纹理映射芯片。这些设计通常是可扩展的:它们可以通过使用更多的芯片来提高性能。可伸缩性有明显的代价:最小的配置需要几个芯片,一些配置必须复制纹理贴图。一个不太明显的成本是复制芯片以提高性能,而不是首先设计单个芯片以提高性能,这几乎是不可抗拒的诱惑。相比之下,Neon是一个单芯片,它的表现就像一个多芯片设计。Neon可以加速OpenGL [19] 3D渲染,以及X11[20]和Windows/NT 2D渲染。由于我们的引脚预算限制了峰值内存带宽,我们从内存系统向上设计Neon以减少带宽需求。氖没有特殊用途的记忆;它的8个独立的32位内存控制器可以访问颜色缓冲区、Z深度缓冲区、模板缓冲区和纹理数据。为了适应我们的门预算,我们在具有类似实现要求的不同操作之间共享逻辑,并将浮点计算留给Digital的Alpha cpu。Neon的性能介于惠普的visualfx4和fx6之间,在大多数操作中远高于SGI的MXE。霓虹灯板的成本比这些竞争对手低得多,因为零件数量少,而且使用了商品dram。
{"title":"Neon: a single-chip 3D workstation graphics accelerator","authors":"Joel McCormack, Bob McNamara, C. Gianos, L. Seiler, N. Jouppi, Kenneth W. Correll","doi":"10.1145/285305.285320","DOIUrl":"https://doi.org/10.1145/285305.285320","url":null,"abstract":"High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are often scalable: they can increase performance by using more chips. Scalability has obvious costs: a minimal configuration needs several chips, and some configurations must replicate texture maps. A less obvious cost is the almost irresistible temptation to replicate chips to increase performance, rather than to design individual chips for higher performance in the first place. In contrast, Neon is a single chip that performs like a multichip design. Neon accelerates OpenGL [19] 3D rendering, as well as X11 [20] and Windows/NT 2D rendering. Since our pin budget limited peak memory bandwidth, we designed Neon from the memory system upward in order to reduce bandwidth requirements. Neon has no special-purpose memories; its eight independent 32-bit memory controllers can access color buffers, Z depth buffers, stencil buffers, and texture data. To fit our gate budget, we shared logic among different operations with similar implementation requirements, and left floating point calculations to Digital's Alpha CPUs. Neon's performance is between HP's Visualize fx4 and fx6, and is well above SGI''s MXE for most operations. Neon-based boards cost much less than these competitors, due to a small part count and use of commodity SDRAMs.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115352430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Performance issues of a distributed frame buffer on a multicomputer 多计算机上分布式帧缓冲区的性能问题
Pub Date : 1998-08-01 DOI: 10.1145/285305.285316
Bin Wei, D. Clark, E. Felten, Kai Li
A multiple-port, distributed frame buffer has been recently proposed to support parallel rendering on multicomputers. This paper describes an implementation of such a distributed frame buffer for the Intel Paragon routing network, and reports its performance results. We have conducted several experiments with the system we have developed. Our results indicate that placing a multipleport, distributed frame buffer directly on the host internal routing network can provide high throughput to eliminate the bottleneck of merging a final image from multiple processors to a frame buffer. This architectural approach can also effectively support image composition for sort-last. The synchronization algorithm we have developed requires only one-way communication and minimizes receive overhead for message passing to the frame buffer. CR Categories: B.4.3 [Input/Output]: Subsystems-Parallel I/O; 1.3.1 [Computer Graphics]: Hardware Architecture-Parallel Processing; C.4 [Performance of Systems]: Design Studies.
最近提出了一种支持多计算机并行渲染的多端口分布式帧缓冲器。本文描述了这种分布式帧缓冲器在Intel Paragon路由网络上的实现,并报告了其性能结果。我们已经用我们开发的系统进行了几次实验。我们的研究结果表明,直接在主机内部路由网络上放置多端口分布式帧缓冲区可以提供高吞吐量,从而消除将来自多个处理器的最终图像合并到帧缓冲区的瓶颈。这种架构方法还可以有效地支持sort-last的图像合成。我们开发的同步算法只需要单向通信,并将消息传递到帧缓冲区的接收开销最小化。B.4.3[输入/输出]:子系统-并行I/O;1.3.1【计算机图形学】:硬件体系结构——并行处理;C.4[系统性能]:设计研究。
{"title":"Performance issues of a distributed frame buffer on a multicomputer","authors":"Bin Wei, D. Clark, E. Felten, Kai Li","doi":"10.1145/285305.285316","DOIUrl":"https://doi.org/10.1145/285305.285316","url":null,"abstract":"A multiple-port, distributed frame buffer has been recently proposed to support parallel rendering on multicomputers. This paper describes an implementation of such a distributed frame buffer for the Intel Paragon routing network, and reports its performance results. We have conducted several experiments with the system we have developed. Our results indicate that placing a multipleport, distributed frame buffer directly on the host internal routing network can provide high throughput to eliminate the bottleneck of merging a final image from multiple processors to a frame buffer. This architectural approach can also effectively support image composition for sort-last. The synchronization algorithm we have developed requires only one-way communication and minimizes receive overhead for message passing to the frame buffer. CR Categories: B.4.3 [Input/Output]: Subsystems-Parallel I/O; 1.3.1 [Computer Graphics]: Hardware Architecture-Parallel Processing; C.4 [Performance of Systems]: Design Studies.","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133481254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Quadratic Bezier triangles as drawing primitives 二次贝塞尔三角形作为绘图基元
Pub Date : 1998-08-01 DOI: 10.1145/285305.285307
J. Bruijns
Bezier Triangles As Drawing Primitives J. Bruijns, Philips Research Laboratories’ We propose to use quadratic Bezier triangles as additional drawing primitives: quadratic Bezier triangles require much less model data for faithful representation of curved surfaces than planar triangles. Therefore, they require less storage and/or transmission capacity. Furthermore, they allow automatic level-of-detail. Finally, they result in considerable savings in model-view transformations and lighting calculations. We present two algorithms for rendering these triangles, each of which can be easily incorporated in hardware render systems currently used for planar triangles. CR
J. Bruijns, Philips研究实验室“我们建议使用二次贝塞尔三角形作为额外的绘图基元:二次贝塞尔三角形比平面三角形需要更少的模型数据来忠实地表示曲面。因此,它们需要较少的存储和/或传输容量。此外,它们还允许自动划分细节级别。最后,它们在模型视图转换和光照计算方面节省了大量的时间。我们提出了两种用于渲染这些三角形的算法,每一种算法都可以很容易地结合到当前用于平面三角形的硬件渲染系统中。CR
{"title":"Quadratic Bezier triangles as drawing primitives","authors":"J. Bruijns","doi":"10.1145/285305.285307","DOIUrl":"https://doi.org/10.1145/285305.285307","url":null,"abstract":"Bezier Triangles As Drawing Primitives J. Bruijns, Philips Research Laboratories’ We propose to use quadratic Bezier triangles as additional drawing primitives: quadratic Bezier triangles require much less model data for faithful representation of curved surfaces than planar triangles. Therefore, they require less storage and/or transmission capacity. Furthermore, they allow automatic level-of-detail. Finally, they result in considerable savings in model-view transformations and lighting calculations. We present two algorithms for rendering these triangles, each of which can be easily incorporated in hardware render systems currently used for planar triangles. CR","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126894558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Simple models of the impact of overlap in bucket rendering 简单的模型重叠在桶渲染中的影响
Pub Date : 1998-08-01 DOI: 10.1145/285305.285318
Milton Chen, Gordon Stoll, Homan Igehy, Kekoa Proudfoot, P. Hanrahan
Bucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benefits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions. Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced—that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently. CR
桶渲染是一种将帧缓冲区细分为独立渲染的连贯区域的技术。这种技术的主要好处是减少了呈现过程中所需的framebuffer内存工作集的大小,并且可以并行处理多个区域。该技术的缺点是计算每个三角形重叠的区域的成本以及当三角形重叠多个区域时多次处理三角形所需的冗余工作。Tile大小是bucket渲染系统中的一个关键参数:较小的Tile大小允许更小的内存占用和更好的并行负载平衡,但会加剧冗余计算的问题。在本文中,我们使用数学模型、仪器和轨迹驱动模拟来评估重叠的影响,并得出重叠问题的范围有限的结论。如果三角形很小,重叠因子本身也很小。如果三角形很大,重叠就会很高,但是像素工作占据了渲染时间。在流水线渲染系统中,当输入三角形的面积等于管道被平衡的面积时,重叠的最坏影响就会发生——也就是说,与三角形相关的计算时间等于与像素相关的计算时间。因此,随着当前三角形率呈指数级增长、屏幕分辨率缓慢增加和逐像素计算的趋势继续将这个平衡点推向面积较小的三角形,桶渲染系统将能够有效地利用较小的图块。CR
{"title":"Simple models of the impact of overlap in bucket rendering","authors":"Milton Chen, Gordon Stoll, Homan Igehy, Kekoa Proudfoot, P. Hanrahan","doi":"10.1145/285305.285318","DOIUrl":"https://doi.org/10.1145/285305.285318","url":null,"abstract":"Bucket rendering is a technique in which the framebuffer is subdivided into coherent regions that are rendered independently. The primary benefits of this technique are the decrease in the size of the working set of framebuffer memory required during rendering and the possibility of processing multiple regions in parallel. The drawbacks of this technique are the cost of computing the regions overlapped by each triangle and the redundant work required in processing triangles multiple times when they overlap multiple regions. Tile size is a critical parameter in bucket rendering systems: smaller tile sizes allow smaller memory footprints and better parallel load balancing but exacerbate the problem of redundant computation. In this paper, we use mathematical models, instrumentation, and trace-driven simulation to evaluate the impact of overlap and conclude that the problem of overlap is limited in scope. If triangles are small, the overlap factor itself is also small. If triangles are large, overlap is high but pixel work dominates the rendering time. In pipelined rendering systems, the worst-case impact of overlap occurs when the area of an input triangle is equal to the area for which the pipeline is balanced—that is, the trianglerelated computation time is equal to the pixel-related computation time. Thus, as the current trends of exponentially increasing triangle rate, slowly increasing screen resolution, and increasing per-pixel computation continue to push this balance point toward triangles with smaller area, bucket rendering systems will be able to utilize smaller tiles efficiently. CR","PeriodicalId":298241,"journal":{"name":"Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131422007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
期刊
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1