I’ve spent some time this week putting Blender’s new Cycles X engine through its paces. The team have written about their plans, I’m excited about it and I appreciate the efforts it takes to make this work.
I’m already very happy with the current implementation of Cycles, and with a bit of time on my hands, I thought I’d test all my three configurations with the new Blender 3.0 Cycles X Alpha.
Changes in Cycles X
The biggest change I can see is that the render tiles have been removed. Previously we could opt to see the full image refined (i.e. progressive updates), or let Cycles render in tiles. Smaller tiles (like 16×16 pixels) would render faster on the CPU, while larger tiles (upwards of 128×128) would render faster on GPUs. This change is nice for those who want to see faster results in the viewport, which typically needs to present the full image without finishing tiles. We no longer have that option on the final render.
Another big change is the removal of the default NLM denoiser. Blender’s reasoning is that the new OptiX denoiser outperforms the previous default implementation. OptiX only works with newer NVIDIA GPUs and is not available on AMD and Intel cards. However, in Blender 2.92 the Intel Open Image Denoiser (OIDN) has been added, which also outperforms NLM. This means there’s now a better denoiser for everyone as OIDN works with the CPU. I’m glad it can have a job working alongside the GPU. Neat!
While the graphs on the Cycles X thread show a significant speed improvement with the Blender test scenes, I thought I’d do some testing with my own scenes, using my various render setups. I’ve compared the current Blender 2.92 release version with the Cycles X eae783e3ce36 build. No denoiser was used in any of these tests.
Test 1: 2x RTX 2080
On my main Z800 workstation with two RTX 2080 cards, the interior test scene of frame 551 from my opening title sequence rendered a little slower with Cycles X. I’ve run this test with various settings. My first attempt was with my own Cinema Scene at 200 samples:
- Cycles 2.92 with 256×256 tiles: 34 seconds
- Cycles 2.92 with 512×512 tiles: 33 seconds (winner)
- Cycles X 3.00 Alpha: 43 seconds
After seeing the results from my slower render node (see below), I’ve repeated the test with a higher number of samples. My reasoning was that perhaps Cycles X takes a bit of time during the setup, and we might see benefits in render speed the longer it calculates. Here’s another attempt with my Cinema Scene at 2000 samples:
- Cycles 2.92 with 256×256 tiles: 3min 53sec (winner)
- Cycles 2.92 with 512×512 tiles: 3min 59sec
- Cycles X 3.00 Alpha: 5min 30sec
Interestingly, even at a higher sample rate, my scene still took much longer in Cycles X. I was wondering if I could reproduce the improvements with one of Blender’s test scenes and tried again, this time with the Junk Shop scene. Here are the results at 200 samples:
- Cycles 2.92 with 256×256 tiles: 52 seconds (winner)
- Cycles 2.92 with 512×512 tiles: 52 seconds (winner)
- Cycles X 3.00 Alpha: 54 seconds
Again Cycles X fares slower on my system, rather than “up to 6x faster” as the Blender internal tests show. Interesting!
The last scene I’ve tried on this setup was the first frame of my Goodbye Mixer animation at 1000 samples:
- Cycles 2.92 with 256×256 tiles: 49sec (winner)
- Cycles 2.92 with 512×512 tiles: 50sec
- Cycles X 3.00 Alpha: 1min 08sec
In all honesty, I was a little disappointed to see this. Thankfully I had some other systems to check out.
Test 2: 1x GTX 970
At first, the above Cinema Scene scene did not want work at all with Cycles X and my GTX 970. It took several attempts to get it going, but with a bit of patience I was able to get a faster render speed with Cycles X. This test was done with 200 samples:
- Cycles 2.92 with 128×128 tiles: 4min 20sec
- Cycles 2.92 with 256×256 tiles: 4min 25sec
- Cycles X 3.00 Alpha: 2min 14sec (winner)
I tried the same with my Cracked Landscape scene, this time at 200 samples:
- Cycles 2.92 with 128×128 tiles: 2min 22sec
- Cycles 2.92 with 256×256 tiles: 2min 16sec
- Cycles X 3.00 Alpha: 1min 06sec (winner)
Sadly the Junk Shop scene did not fit into the 4GB of my card’s VRAM, so I couldn’t test it with Cycles X. It did work fine in Blender 2.92 though (more on that below).
The last test I did with this system was my Goobye Mixer Scene at 1000 samples:
- Cycles 2.92 with 128×128 tiles: 7min 13sec
- Cycles 2.92 with 256×256 tiles: 7min 02sec
- Cycles X 3.00 Alpha: 2min 38sec (winner)
Especially the last result is seriously exciting: my old GTX 970 is way faster than before, depending on the scene rendering up to 3x faster. This has serious consequences on my render nodes and the way Blender can handle animations. Very impressive!
Test 3: 2x Quadro K4000
My other render node had the same issue as the above, with the Cinema and the Junk Shop Scene. Both exceed the 3GB of RAM on the K4000 cards. It’s just as well, I had almost expected Cycles X to not work on these old ladies anymore. To my surprise they are still supported (although this may change before its final release).
I’ve used a less taxing scene for this test, hence the render times are not meant as a comparison to the above, just as a comparison between the current and the new Cycles implementation. I’ve used my Cracked Landscape scene (frame 781 rather randomly) at 200 samples:
- Cycles 2.92 with 128×128 tiles: 3min 41sec
- Cycles 2.92 with 256×56 tiles: 3min 25sec (winner)
- Cycles X 3.00 Alpha: 3min 28sec
Here are the results of my Goodbye Mixer scene at 200 samples:
- Cycles 2.92 with 128×128 tiles: 2min 44sec
- Cycles 2.92 with 256×256 tiles: 2min 41sec
- Cycles X 3.00 Alpha: 1min 56sec (winner)
This one was a bit of a surprise too: Cycles X can render faster with certain scenes, but it really depends. Intersting.
Test 4: 1x RTX 2080
Perplexed by the above findings, it dawned on me that perhaps multiple GPU implementation is not something that works well in the current version of Cycles X. I decided to return to my main system and repeat the tests, this time disabling one of my RTX 2080 cards.
Cinema Scene at 200 samples:
- Cycles 2.92 with 128×128 tiles: 57sec
- Cycles 2.92 with 256×256 tiles: 55sec (winner)
- Cycles X 3.00 Alpha: 55sec (winner)
Junk Shop Scene at 200 samples:
- Cycles 2.92 with 128×128 tiles: 1min 18sec
- Cycles 2.92 with 256×256 tiles: 1min 16sec
- Cycles X 3.00 Alpha: 44sec (winner)
Goobye Mixer at 1000 samples:
- Cycles 2.92 with 128×128 tiles: 1min 33sec
- Cycles 2.92 with 256×256 tiles: 1min 30sec
- Cycles X 3.00 Alpha: 1min 05sec (winner)
That settles it: Cycles X fares better in a single GPU environment at the moment. It’s probaby just as well, it’s the configuration most users will have. This may change in future versions of course. Interestingly, the speed increases here are not as mindblowing as the ones I’ve seen on my GTX 970.
Real-time Viewport Previews
The most significant benefit of Cycles X is how fast the rendered viewport displays an image. It’s… FAST! The Blender guys have a nice video comparison on the Cycles X post. It shows exactly what happens on my main system with the RTX 2080 cards. It seriously rocks for previews!
I made this video because the impact is difficult to describe in words. Enjoy!
Absent Features in Cycles X
It’s early days and some features are not implemented in Cycles X yet. One of those is what I’d like to call the VRAM Spill Patch. It’s been in Blender since 2.80 and lets us render scenes that don’t usually fit into a GPU’s VRAM by cleverly swapping textures in and out as necessary during the render process. Rather than showing an error message, we do get a render with a small time penalty.
I’ve tried rendering the Junk Shop and Cinema scenes on my limited K4000 setup, but neither would start rendering in Cycles X. I know the patch has issues with multiple GPUs, but it didn’t want to budge even with one GPU disabled. This leads me to believe that the VRAM Spill Patch is not implemented at this time.
Another currently absent feature is volumetrics. The scene won’t fail to render, but the effect just won’t be applied, leading to very different render results. Here’s a comparison of the Blender Junk Shop scene. On the left is the result from Blender 2.92, on the right is Cycles X.
Conclusion
I’ve spent two days across three computers doing these tests, and it’s been quite an eye opener for me. Using my own real life scenes and systems shows mixed results that do not quite compare to the Blender tests. Like one of my supporters said, we don’t all own a high-end GPU from 2018 (i.e. the Qudro RTX 6000 for $4000… although it’s certainly on my wish list).
The card in my reach that currently benefits the most from Cycles X is the GTX 970 from 2016. It consistently renders twice as fast compared to the regular Cycles 2.92, even up to 3x faster depending on the scene. That’s seriously unexpected, and a wonderful surprise.
While a single RTX 2080 shows some speed improvements, it’s not as significant as I had expected. Perhaps it’s my own fault for not letting scenes render for longer, a much higher sample count may deliver different results. I may try that another day.
Dual GPU configuation doesn’t seem to give Cycles X as much of an edge as we see in the 2.92 version, where two cards generally translate into “twice as fast”. Having said that, it seems to depend on the scene, as the dual K4000 test shows: sometimes Cycles X is faster, sometimes Cycles 2.92 is faster (but really not by much).
The realtime viewport preview is a huge improvement and I’m looking forward to working with it.
If there’s one wish I have for where Cycles X is going – especially before it’s 100% production ready – then its a choice. ONe Blender with two versions of Cycles. Much like we have multiple denoisers to choose from, perhaps we can have both versions of Cycles in future versions of Blender. As my tests show, it might be beneficial to do a test with either version and see which one fares better for a given scene and GPU combination. Other than that, I can’t wait to see where Cycles X is going. 😎