Kristoffer Dyrkorn, March 9, 2025

Epilogue

(This article is the last part of a series. You can jump to the previous section if you would like to.)

In this and the previous article series we have looked at two methods to rasterize triangles. One is based on first sampling all pixels inside the axis-aligned bounding box of the triangle, and then drawing the pixels that lie inside the triangle. The other is based on first traversing the triangle edges, calculating the start and end coordinates for the horizontal lines, and then drawing those lines.

But - which of the two methods is the best one?

It all depends on what you need. The first method (sampling) is very well suited for a runtime environment where parallel execution is natively supported.

In the examples here we have only drawn single-color triangles, meaning all vertex colors are set to the same values. But, if more advanced rendering is desired, one would need to assign separate colors, normals or uv coordinates (for texture mapping) to each of the triangle vertices. You would then interpolate those values when drawing the pixels. The sampling method makes this kind of interpolation easy to implement.

The scanline conversion method offers a more direct way to draw pixels - it is not based on sampling a region larger than the triangle itself. Instead it works only on the pixels that the triangle covers. This means it will normally require fewer operations, and thus be faster - if running in a single thread. On my machine (a Macbook Air with an M1 CPU, using Chrome) the scanline rasterizing method is roughly 5 times as fast as the sampling method when drawing single-color triangles. This was measured by comparing - over time - the average time to draw a triangle, when using the same test application for both implementations. I also consider the two implementations to be equally sophisticated/optimized, so the numbers should be reasonably fair and representative. Whether this performance difference actually matters depends on what your needs are. But implementing support for interpolation across many values will be more complex here.

Making things even faster

As mentioned, a software rasterizer will never be fast enough. But substantially improving the performance at this stage is not that simple - partly due to the limited options we have in the programming language and runtime platform (JavaScript in the browser). We also have little control of how the JavaScript engine turns our code into machine code on the CPU, and how to balance CPU work and memory usage.

An idea for an optimization on a higher level could be to recognize that edges that are shared between triangles will be scan converted twice, one for each of the two triangles. If the triangles we draw have many shared edges (for example, if we draw mesh objects), we would then be able to cut the scan conversion work roughly in half if we just loop through all edges once and scan convert each of them. We would have to look up the right pairs of edges (and their respective start and end coordinates) afterwards as we draw each triangle.

The uncertainty in this approach is that reading or writing to memory is much slower than reading or writing to variables (stored in registers or on the stack). So although we reduce the CPU work the actual performance benefit we will get might depend on whether the memory cache is efficient or not. So high-performance code that accesses memory must be designed to be cache-friendly.

A different challenge is that modern CPUs are heavily pipelined - so branching can be expensive. We need to make the branch prediction logic in the CPU as successful as possible. Meaning, we need to avoid any inner loops containing if-statements with randomly varying outcomes.

In any case, the only way to know for sure what works and what does not, is to experiment and measure.

In the rasterization code I have tried to exploit a feature in Chrome where variables of the Number type will be represented - and kept - as native 32-bit integers (so-called Small Integers) as long as they only store integer values. This will speed up our math operations tremendously - but it still is an internal optimization in Chrome that we have little control over. It would be useful if there was a way to ensure (or at least verify) that this optimization takes place when running our code. Again, we have limited control over our runtime environment.

Here are some other ideas I have tried, that don’t seem to work. Feel free to revisit them if you see more options here!

Other implementations

If you want to have a look at other rasterizers that are based on scanline conversion, please check out the playdate-dither3d rasterizer written by Aras Pranckevičius, based on the classic texture mapping articles by Chris Hecker. Or, Kurt Fleischer’s article “Accurate polygon scan conversion using half-open intervals”, from the book “Graphics Gems III”. There is source code available on GitHub, and a paper providing a detailed explanation of the algorithms.



Have fun!


Kristoffer


About the author

Kristoffer Dyrkorn is a Senior Principal Software Engineer at Autodesk. He makes software that renders geodata in architectural visualizations. He can be reached at kristoffer.dyrkorn AT autodesk.com.