I'm first in line to shit on AMD, but you're wildly over stating GCN and RDNAs incompetence and Ampere's advantage. And exaggerating architectural effects on real world performance in general. AMD sucks at a lot of things, but they're actually pretty good at basic bitch raster performance, which is what Switch 2 will be dependent on 99% of the time. They ain't gonna be raytracing shit on a mobile SOC underclocked into the dirt on a giant ass 8nm node powered by a watch battery.
I’m not exaggerating anything, these architectures have been micro-benched to hell and back.
GCN is the worst architecture AMD put out and is the historic moment in time where they gave Nvidia the lead since 2012. It fucked them up so bad that.
You think because its doing raster that the ampere second cuda core is idle? ~70% of a raster pipeline is compute. Now MORE than ever.
If you want to cut Ampere TFlops, then buckle up, you have to kneecap previous platforms TFlops numbers as they could not even do them concurrently with compute.
It ain’t gonna be 8nm Samsung. I’ve down the math on it many times now. Doesn’t fit. The 12 SMs and 8 A78C cores take ~7.5B transistors. Just those, nothing else. At 200mm^2 with the same Samsung node as Tegra Orin you would have ~8.5-9B transistors. It doesn’t fit. You’re far away from having an APU with that kind of transistor budget. Go open up the Van Gogh die shot for the fun of it. That was fully custom for Valve and look how much area is not dedicated to cores. Nvidia ain’t doing a 100% custom solution to Nintendo, that chip will be used elsewhere like usual, shield 2 or something. So more bloated for more general tasks. 560MHz is not even a profile that makes sense on 8nm.
It’ll be Samsung’s 5LPP or variant of it as Nvidia typically has their own twist on nodes. It’s Samsung’s 7nm line with adjustments, it’s cheap, it’s the remaining node in that category as they removed 7L and 6L, and it’ll allow for higher density but again Nvidia does not go ever full density. 60MTr/mm^2 would fit 12B transistor comfortably into 200mm^2. Now you have an APU.
Glad that’s settled
look at the relatively marginal gains from PS4 to PS5. Sony's own crown jewel technical wizards Naughty Dog could only get tlou 2 remastered running at 1440p on PS5, matching the PS4 Pro's ancient GCN cores.
That’s this gen’s diminishing returns. That’s the effort they were willing to make to get a quick $
Certainly nobody in their right mind thinks TLoU part 2 is peak PS5 ? Are you suggesting that?
Clearly there was no grand canyon architectural advantage here as you suggest.
You’re comparing AMD’s Tahiti desktop architecture solution to RDNA 2 kneecapped without infinity cache. It’s a neutered dog. Works well enough but all advancements AMD made on cache is pretty much culled back. They still didn’t have their paradigm shift like everything ampere got. You still have problems with occupancy, even more exacerbated by no infinity cache. Ampere SMs in tegra remain authentic to the desktop architecture even if there’s less.
But you say Nvidia's TF's are different, better.
Better occupancy and concurrency. Which is mainly what matters in modern architectures.
Concurrent, asynchronous, cache systems. Way more important than raw flops.
Compared to PS4 Tahiti, in tech advancements it’s like comparing a 69 600HP Camaro to a McLaren F1’s 600HP. Which one wins the Nurburgring?
Again no, AMDs GPUs have always been competitive in raster performance, they just suck ass at feature set, RT, and upscaling. Base PS5's closest equivalent RX5700 was very competitive with ampere RTX 3060 (single digit percentage gaps on a multi-game average), so again, where exactly is this grand canyon gulf in architectural gains?
So as per your maths, 3584/2 = 1792 cuda cores are keeping up with 2304 shading units clock for clock for old games? 77% of the 5700XT TFlops right?
Now go see how Alan Wake 2 performs between those two, just a taste of how modern architectures differ compute wise.
In Alan wake 2's case a 5700XT does not even keep up with a 2060 Super, 75% of the performance of a 2060 Super in fact. Mesh shaders.
2060 Super with 85% less shader cores, lower clocks, 7.18TFlops vs 9.754TFlops of 5700XT, performs 28% better
Hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
I bet you say "Not fair!" Yea, that's a generational difference in architectures and games are just starting to really use modern architectures and stop using crossgen tech. This includes Cyberpunk 2077 which would have benefitted a lot from this but they were supporting (stupidly) PS4 GCN, so that's the end result. Alan Wake 2 is the first, of many. PS5 & Xbox series X have those. Inevitably the games that will push the best graphics in the gen will use it.
Serialized fixed function pipeline are limited by bandwidth and if you try to push more triangles than you can through that pipeline it doesn't scale. By going with full programmable it scales to all the cores you've got on the GPU. The pipeline is "compute-like" and Turing/Ampere loves that. You remove the bottleneck in the middle of your vertex shader and your pixel shader of traditional architectures, telling you no, this is the fixed rate of triangles I can output. The new method also optimizes vertex reuse, reduces the attribute fetches. Because they are reduced to tiny meshlets, they stay in cache, rather than fetching stuff from really far into main GPU memory or even worse the system memory. The geometry can work entirely in pipe. It's made much like Nanite, for big geometry with automatic culling and LODs. Procedural instancing for hair / vegetation / water which are geometry intensive.
But why would Switch devs ignore a feature that is embedded in the NVN API? There's big performance boosts to use them. Why would they not use modern feature sets supported for their SoC and kneecap themselves to match PS4 GCN limits? Who would do that?
Like it or not, teraflops actually are a very good gauge of performance, even across architectures.
Wow, what year is this.jpg
Then Xbox series X is 20% faster than PS5, confirmed! /s
Not 100% exact, but certainly nowhere close to enough of a discrepancy to turn a 2TF part into a 4TF part. Differences in equal terraflop figures are minor at best across architectures, especially only 1 generation apart.
Games have moved away from raster more and more.
Not even including RT. Modern engines and especially UE5 have a ton of compute.
Where are the PS4 games with Nanite? Lumen? Not even Fortnite on the more powerful PS4 Pro supports it. I don't understand, because even the weaker TFlops Steam deck has those, even hardware Lumen which AMD is not even that good at. Devs just didn't want the money and port to PS4? All of them? Black Myth Wukong works on Steam deck, where's PS4 Pro port? Steam Deck is ~1/4 the TFlops of PS4 pro, should be running high res comparatively. Imagine, Steam deck with a fucking windows OS to proton layer and the typical PC handheld inefficiencies still has it. No reason to not port on huge userbase PS4 platform...
Hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Then if we add RT, I've seen so many arguments that it won't have RT on switch because "lol" too weak. A steam deck still runs
Indiana Jones with RT, while a 5700 is what comparatively? A black screen. Ampere RT is much better. Steam deck runs Metro Exodus EE RT, control reflections, etc. Why would anyone assume Switch can't. Mind blowing. Peoples who say that are tech illiterate.