Applications of GF100’s Compute Hardware
Last but certainly not least are the changes to gaming afforded by the improved compute/shader hardware. NVIDIA believes that by announcing the compute abilities so far ahead of the gaming abilities of the GF100, that potential customers have gotten the wrong idea about NVIDIA’s direction. Certainly they’re increasing their focus on the GPGPU market, but as they’re trying their hardest to point out, most of that compute hardware has a use in gaming too.
Much of this is straightforward: all of the compute hardware is what processes the pixel and vertex shader commands, so the additional CUDA cores in the GF100 give it much more shader power than the GT200. We also have DirectCompute, which can use the compute hardware to quickly do some things that couldn’t be done quickly via shader code, such as Self Shadowing Ambient Occlusion in games like Battleforge, or to take an NVIDIA example, the depth-of-field effect in Metro 2033.
Perhaps the single biggest improvement for gaming that comes from NVIDIA’s changes to the compute hardware are the benefits afforded to compute-like tasks for gaming. PhysX plays a big part here, as along with DirectCompute it’s going to be one of the biggest uses of compute abilities when it comes to gaming.
NVIDIA is heavily promoting the idea that GF100’s concurrent kernels and fast context switching abilities are going to be of significant benefit here. With concurrent kernels, different PhysX simulations can start without waiting for other SMs to complete the previous simulation. With fast context switching, the GPU can switch from rendering to PhysX and back again while wasting less time on the context switch itself. The result is that there’s going to be less overhead in using the compute abilities of GF100 during gaming, be it for PhysX, Bullet Physics, or DirectCompute.
NVIDIA is big on pushing specific examples here in order to entice developers in to using these abilities, and a number of demo programs will be released along with GF100 cards to showcase these abilities. Most interesting among these is a ray tracing demo that NVIDIA is showing off. Ray tracing is something even G80 could do (albeit slowly) but we find this an interesting way for NVIDIA to go since promoting ray tracing puts them in direct competition with Intel, who has been showing off ray tracing demos running on CPUs for years. Ray tracing nullifies NVIDIA’s experience in rasterization, so to promote its use is one of the riskier things they can do in the long-term.
NVIDIA's car ray tracing demo
At any rate, the demo program they are showing off is a hybrid program that showcases the use of both rasterization and ray tracing for rendering a car. As we already know from the original Fermi introduction, GF100 is supposed to be much faster than GT200 at ray tracing, thanks in large part due to the L1 cache architecture of GF100. The demo we saw of a GF100 card next to a GT200 card had the GF100 card performing roughly 3x as well as the GT200 card. This specific demo still runs at less than a frame per second (0.63 on the GF100 card) so it’s by no means true real-time ray tracing, but it’s getting faster all the time. For lower quality ray tracing, certainly this would be doable in real-time.
Dark Void's turbulence in action
NVIDIA is also showing off several other demos of compute for gaming, including a PhysX fluid simulation, the new PhysX APEX turbulence effect on Dark Void, and an AI path finding simulation that we did not have a chance to see. Ultimately PhysX is still NVIDIA’s bigger carrot for consumers, while the rest of this is to entice developers to make use of the compute hardware through whatever means they’d like (PhysX, OpenCL, DirectCompute). Outside of PhysX, heavy use of the GPU compute abilities is still going to be some time off.
115 Comments
View All Comments
Stas - Tuesday, January 19, 2010 - link
all that hype just sounds awful for nVidia. I hope they don't leave us for good. I like AMD but I like competition more :)SmCaudata - Monday, January 18, 2010 - link
The 50% larger die size will kill them. Even if the reports of lower yields are false they will have to get a much smaller profit margin on their cards than AMD to stay competetive. As it is the 5870 can run nearly any game on a 30" monitor with everything turned up at a playable rate. The target audience for anything more than a 5870 is absurdly small. If Nvidia does not release a mainstream card the only people that are going to buy this beast are the people that have been looking for a reason not to buy and AMD card all along.In the end I think Nvidia will loose even more market share this generation. Across the board AMD is the fastest card at every price point. That will not change and with the dual GPU card already out from ATI it will be a long time before Nvidia has the highest performing card because I doubt they will release a dual GPU card at launch if they are having thermal issues with a single GPU card.
BTW... I've only ever owned Nvidia cards but that will likely change at my next system build even after this "information."
Yojimbo - Monday, January 18, 2010 - link
what do you mean by "information"?SmCaudata - Monday, January 18, 2010 - link
Heh. Just that it was hyped up so much and we really didn't get much other than some architectural changes. I suppose that maybe this is really interesting to some, but I've seen a lot of hardware underperform early spec based guesses.The Anandtech article was great. The information revealed by Nvidia was just okay.
qwertymac93 - Monday, January 18, 2010 - link
I really hope fermi doesn't turn into "nvidias 2900xt". late, hot, and expensive. while i doubt it will be slow by any stretch of the imagination, i hope it isn't TOO hot and heavy to be feasible. i like amd, but nvidia failing is not good for anybody. higher prices(as we've seen) and slower advancements in technology hurt EVERYONE.alvin3486 - Monday, January 18, 2010 - link
Nvidia GF100 pulls 280W and is unmanufacturable , details it wont talk about publiclyswaaye - Monday, January 18, 2010 - link
Remember that they talked all about how wondrous NV30 was going to be too. This is marketing folks. They can have the most amazing eye popping theoretical paper specs in the universe, but if it can't be turned into something affordable and highly competitive, it simply doesn't matter.Put another way, they haven't been delaying it because it's so awesome the world isn't ready for it. Look deeper. :D
blowfish - Monday, January 18, 2010 - link
This was a great read, but it made my head hurt!I wonder how it will scale, since the bulk of the market is for more mainstream cards. (the article mentioned lesser derivatives having less polymorph engines)
Can't wait to see reviews of actual hardware.
Zool - Monday, January 18, 2010 - link
Iam still curious why is nvidia pushing this geometry so hard. With 850 Mhz the cypress should be able to make 850mil polygons/s with one triangel/clock speed. Now thats 14 mil per single frame max at 60fps which is quite unrealistic. Thats more than 7 triangels per single pixel in 1920*1050. Making that amount of geometry in single pixel is quite waste and also botlenecks performance. U just wont see the diference.Thats why amd/ati is pushing also adaptive tesselation which can reduce the tesselation level with copute shader lod to fit a reasonable amount of triangels per pixel.
I can push teselation factor to 14 in the dx9 ATI tesselation sdk demo and reach 100fps or put it on 3 and reach 700+ fps with almost zero difference.
Zool - Tuesday, January 19, 2010 - link
Also want to note that just tesselation is not enough and u always use displacement mapping too. Not to mention u change the whole rendering scene to more shader demanding(shadows,lightning) so to much tesselation (like in uniengine heaven on almost everything, when without tesselation even stairs are flat) can realy make big shader hit.If u compare the graphic quality before tesselation and after in uniengine heaven i would rather ask what the hell is taken away that much performance without tesselation as everything looks so flat like in a 10y old engine.
The increased geometry setup should bring litle to no performance advantage for gf100, the main fps push are the much more eficience shaders with new cache architecture and the more than double the shaders of course.