• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PlayStation 5 Pro Unboxed, 16.7 TFLOPs GPU Compute Confirmed

FireFly

Member
Is Kepler running the PlayStation engineering department or something? Why should we believe his words instead of the actual chief architect of the machine? No one is saying the entire 300 TOPs is coming from that officially mentioned custom block by the way, i also do believe that PS5 PRO has RDNA 3 dual issue fp32 like the leaked developer documentation mentioned. This leaked Brazilian manual spec list doesn't invalidate it.
Why can't custom hardware refer to the SWMMAC support that is coming to RDNA 4?

 

twilo99

Member
At this point Some would hope it's pointless to judge a console on specs alone.

After years of arguments about PS5 vs XSX, the latest "impressive" non-binary game runs a bit better on PS5, and nobody finds it even odd when the supposedly much weaker console is the one performing a bit better.

Most 3rd party games are “enhanced” for PlayStation so comparing performance is rather pointless?
 

Lysandros

Member
Why can't custom hardware refer to the SWMMAC support that is coming to RDNA 4?

I am not saying that PS5 PRO doesn't have SWMMAC at all. I am only sticking to the official statements. By "custom hardware for ML" or "an actual custom silicon for AI upscaling" i understand a custom/specific hardware component akin to PS4 PRO's hardware ID buffer or PS5's Cache Scrubbers. Sony/Cerny didn't call PS4 PRO's RPM feature "custom" for a reason, because this was a standard feature AMD's next/future architecture (Vega).
 
Last edited:

FireFly

Member
I am not saying that PS5 PRO doesn't have SWMMAC at all. I am only sticking to the official statements. By "custom hardware for ML" or "actual custom silicon for AI upscaling" i understand a custom/specific hardware component akin to PS4 PRO's hadrware ID buffer or PS5's Cache Scrubbers. Sony/Cerny didn't call PS4 PRO's RPM feature "custom" for a reason, because this was a standard feature AMD's next/future architecture (Vega).
It depends how you look at it. Maybe Sony asked for that feature and AMD also decided to include it in RDNA 4. If FSR 4 is only 9-12 months old as claimed, it wouldn't have influenced any hardware design.

Edit: And in any case even if Cerny is referring to upscaling hardware that doesn't exist in RDNA 4, that still doesn't confirm that this hardware is located outside of the CUs.
 
Last edited:

SmokSmog

Member
48079b289ecda4afbda78e35ceda913b699677bc86d1cf96f2cfdc31f2bba49d.png


Here

 
Last edited:

Lysandros

Member
Edit: And in any case even if Cerny is referring to upscaling hardware that doesn't exist in RDNA 4, that still doesn't confirm that this hardware is located outside of the CUs.
I mean FireFly... Again, the second quote from the developer is literally "an actual custom silicon AI upscaler." Sounds awfully alot like a dedicated block don't you think? How it would be singular if inside of CUs?
 
Yeah, that's why he worked on PlayStation World mag *rollseyes*

Nah that was just Rich infiltrating enemy lines.

that is simply not true. first of all, CPUs that adjust their clock speed wasn't always a thing that existed. up until the late 2000s, PC CPUs all ran at a set clock speed that only changed if the user intervened by manually changing the clock speed.

secondly, every single console except the PS5 has a locked clock speed for both the GPU and the CPU. on Switch these clock speeds can be set by the developers, who can chose between 3 clock speed profiles in handheld mode, and are locked to a single profile in docked mode.

game consoles are not PCs. a game console has to always be predictable for the developers. and a locked clock speed is the only way to do that.
the PS5 also always clocks as high as it can. it only adjusts to such a negligible degree, and only if it hits its TDP limit, that it's almost not worth even taking note of.

Okay, so lemme clarify something. Yes, older CPUs did run at fixed clocks, but they also had DIP switches/jumpers to adjust clock settings based on multipliers. And the thing is, while older CPUs ran at fixed clocks, that approach was not efficient long-term because it meant CPUs couldn't save on power consumption. Which back in the day, was less of a problem because CPUs in older generations consumed less power even at full clocks.

But when laptops started becoming mainstream, and when CPUs started consuming a lot more power, things shifted to variable clocks....to make the chips more power-efficient. Which should hint as to why PS5's approach is actually the better of the two: it's more power-efficient in the long run, so the CPU isn't wasting more energy than it needs to on any given task. With the Switch, a fixed clock approach works because it's a very low-power device. The different clock settings you're talking about are set to different power profiles.

Now technically, the CPUs in Series X and PS5 are based on mobile Radeon CPUs, so they're already pretty good at power efficiency. Sony just wanted even more power efficiency so they implemented variable clocks and Smart Shift. Microsoft went with lower GPU clocks and that's probably why they felt they could get away with fixed clocks on their CPU. But considering a game isn't going to always be taxing the CPU and GPU simultaneously, it probably makes more sense to shift the power budget between the CPU and GPU as needed.

And, since PS5 is shifting its power budget around, it "can't" lock its CPU clock. Yet their CPU is still more efficient than Series X's, because they offloaded a ton of the I/O operations to dedicated silicon rather than relying on CPU cycles to process them. You try saying this leads to unpredictable performance, but we've seen more than enough multiplat games the past four years showing that isn't really an issue and, more often than not, PS5 having slight performance advantages over Series X. Games know generally where the upper limits of the CPU will be if the GPU is under full load and vice-versa. Otherwise if the game's running logic that doesn't require full clocks, the CPU won't run at full clocks and saves power along the way.

So if I was wrong on anything, it was assuming how Series' CPUs functioned. Had to go look up some info on that front, that's my bad. But, in hindsight, I think it just shows why, with systems as high-performance as the 9th-gen consoles, why a dynamic power profile & clocks works out better in the long-term, when you want to save as much on power consumption as possible. Sony had a performance profile that necessitated a different approach, and it just so happens they made the better choice. They got something equal to Series X (sometimes better) in the majority of 3P games, while having a cheaper APU and even cheaper CPU (they cut out some of the unneeded units).

However, if PS5 had a weaker GPU and still did variable clocks, AND it didn't have the dedicated I/O silicon or other customizations...then yes, the fixed clocks approach of Series X would unquestionably be the better solution of the two.
 
Last edited:

FireFly

Member
I mean FireFly... Again, the second quote from the developer is literally "an actual custom silicon AI upscaler." Sounds awfully alot like a dedicated block don't you think? How it would be singular if inside of CUs?
I agree, though that's not coming from Cerny. I was previously a believer in a custom A.I block, but from a die size perspective it makes sense to re-use as much as possible, and you can hit 300 TOPs using sparsity + dual-issue. But it would require a high boost clock, so we will see.
 
Last edited:
Like it has been said so many times in this forum, including this thread, AMD presents RDNA3 TFLOPs while accounting for VOPD.
But VOPD is almost useless for games, so even Sony decided to present the value without it.

That's heavy cope, if Sony could market this thing as 33.4 Teraflops they absolutely would, it's clearly custom, based off RX 6800 and just has ML and RT from RDNA 4, it won't have any RDNA 3 or 4 features unless they are specific to ML or RT, it's basically identical to the RX 6800, that's not a criticism before you all start getting salty and crying, go look at the specs of the RX 6800, it's literally the 11th best Gaming GPU on the market right now and with the RDNA 4 RT and ML added it will likely be top 6.
 

winjer

Gold Member
That's heavy cope, if Sony could market this thing as 33.4 Teraflops they absolutely would, it's clearly custom, based off RX 6800 and just has ML and RT from RDNA 4, it won't have any RDNA 3 or 4 features unless they are specific to ML or RT, it's basically identical to the RX 6800, that's not a criticism before you all start getting salty and crying, go look at the specs of the RX 6800, it's literally the 11th best Gaming GPU on the market right now and with the RDNA 4 RT and ML added it will likely be top 6.

You have so very little understanding of hardware, is impressive.
 
Sure mate, I've only been a hardware engineer for 25 years, you're the one coping, it's not equivalent directly to any PC GPU, it's a custom GPU, but it's clearly based off the RDNA 2.0 RX 6800, the fact Sony are not publishing the 33.4 teraflop number is because it doesn't have VOPD and likely won't have anything else from RDNA 3 or 4 unless it's specific to the RT and ML functions.
 

Lysandros

Member
I agree, though that's not coming from Cerny. I was previously a believer in a custom A.I block, but from a die size perspective it makes sense to re-use as much as possible, and you can hit 300 TOPs using sparsity + dual-issue. But it would require a high boost clock, so we will see.
VOPD+sparsity leave us at 267 TOPs. And i very much doubt that any sane person would round up 267 to 300, the difference is just too big. And also 33 TOPs from a dedicated ML block would offer much higher efficiency and be much less theoretical than the 33 TOPs obtained from CU based VOPD+Sparsity as you would imagine. In that case 1 TOPs≠1 TOPs or 1 TOPs>1 TOPs.
 

winjer

Gold Member
Sure mate, I've only been a hardware engineer for 25 years, you're the one coping, it's not equivalent directly to any PC GPU, it's a custom GPU, but it's clearly based off the RDNA 2.0 RX 6800, the fact Sony are not publishing the 33.4 teraflop number is because it doesn't have VOPD and likely won't have anything else from RDNA 3 or 4 unless it's specific to the RT and ML functions.

I call BS on that. You don't even understand what VODP is, and how AMD uses it to claim higher theoretical TFLOP numbers than what is achieved in gammign workloads.
Do yourself a favour and look up about AMD's implementation of Vector Operation, Dual.
 

Pedro Motta

Gold Member
That's heavy cope, if Sony could market this thing as 33.4 Teraflops they absolutely would, it's clearly custom, based off RX 6800 and just has ML and RT from RDNA 4, it won't have any RDNA 3 or 4 features unless they are specific to ML or RT, it's basically identical to the RX 6800, that's not a criticism before you all start getting salty and crying, go look at the specs of the RX 6800, it's literally the 11th best Gaming GPU on the market right now and with the RDNA 4 RT and ML added it will likely be top 6.
So Sony should market their console with triple the TFlops of the original, while not providing 3x the performance? Are you mental? There is a reason they market the 16.7 TFlops.
 
well it does, that's what the documented compute of the 7600 XT has, it's dual issue 22 tflops, that's why i made the distinction between RDNA 2 and RDNA 3 teraflops, the same as you can't measure teraflops directly between AMD and Nvidia or Intel, the instruction sets are different.
 
I am still I trying to get my head around the fact that some people thought it was going to be 33 TF, as in equivalent TF to the PS5 and not dual issue. Three times the GPU power really for $700. Like a 4070? That alone is the price of the PS5 Pro.
 

FrankWza

Member
I don't know what's currently causing more seething on the internet - the PS5 Pro or Trump or Kamala. The comments under all the unboxing videos sound like Cerny fucked all their moms and sisters. What makes a person so angry at something that claim to have zero interest in?
It has been worse since I posted this. Now they're seeing that PSSR is legit and the Pro is beastly. Im afraid for my friends when GTA VI comes out
I love that this console exists just to see the feelings and emotions that it brings out. People are so angry that it's $700. Don't buy it or wait for a price drop.
 

muno

Neo Member
RDNA3 cards have higher flops on paper only for the most part. They have a "dual-issue" capability which pretty much doubles the maximum theoretical throughput, however it relies heavily on the compiler to find dual-issue opportunities, which ends up very, very few in reality. Real-world compute performance isn't much better than RDNA2 in most cases.

Sony putting 16.7 tflops on the spec sheet was a good call because in reality, the raster perf will be very close to a 16.7 tflop RNDA 2 equivalent GPU.

Don't believe the specs at face value. it's more nuanced and more of an "it depends" situation.
 

Lysandros

Member
RDNA3 cards have higher flops on paper only for the most part. They have a "dual-issue" capability which pretty much doubles the maximum theoretical throughput, however it relies heavily on the compiler to find dual-issue opportunities, which ends up very, very few in reality. Real-world compute performance isn't much better than RDNA2 in most cases.

Sony putting 16.7 tflops on the spec sheet was a good call because in reality, the raster perf will be very close to a 16.7 tflop RNDA 2 equivalent GPU.

Don't believe the specs at face value. it's more nuanced and more of an "it depends" situation.
Don't you think the feature could be more useful in a possibly improved state in a fixed spec APU form? Can't games be programmed to make a better use of it in a similar fashion to RPM with a lower level API? Even a ~10% uplift would be significant.
 
Last edited:

Loxus

Member
AMD already confirmed that the AI Accelerator in RDNA3 are similar to Nvidia's Tensor Cores and we'll be seeing the second generation of the AI Accelerators in RDNA4.

AMD plans to harness the power of AI to transform gaming with its next-gen GPUs
In a recent interview with the Japanese gaming website 4gamer, the AMD execs detailed some of what we can expect from RDNA 4. Naturally, front and center was confirmation that we’ll be seeing the second iteration of Team Red’s AI Accelerator cores (similar to Nvidia’s Tensor cores), which were first introduced in the current-gen RDNA 3 GPUs - such as the excellent Radeon RX 7900 XTX.

It's also confirmed dual-issue will continue in RDNA4.

WMMA is done using the AI Accelerators.
From my understanding, the below image shows how the AI Accelerators work by utilizing the SIMD32.
G3dJcdl.jpeg


As seen above in the dual SIMD32, one SIMD32 can do Float or Int, while the other can do only Float.

With dual-issue, the Float/Int SIMD32 would be utilized by the AI Accelerators and the Float only SIMD32 would be utilized for normal utilization.

This would explain why Sony shows the PS5 Pro as having 16.7 TF.
 

shamoomoo

Banned
I agree, though that's not coming from Cerny. I was previously a believer in a custom A.I block, but from a die size perspective it makes sense to re-use as much as possible, and you can hit 300 TOPs using sparsity + dual-issue. But it would require a high boost clock, so we will see.
There's nothing stoping a small block accounting for some of the tops performance.

AMD combines the AI of their GPU and AI engine on strixpoint.
 
Last edited:
AMD already confirmed that the AI Accelerator in RDNA3 are similar to Nvidia's Tensor Cores and we'll be seeing the second generation of the AI Accelerators in RDNA4.

AMD plans to harness the power of AI to transform gaming with its next-gen GPUs
In a recent interview with the Japanese gaming website 4gamer, the AMD execs detailed some of what we can expect from RDNA 4. Naturally, front and center was confirmation that we’ll be seeing the second iteration of Team Red’s AI Accelerator cores (similar to Nvidia’s Tensor cores), which were first introduced in the current-gen RDNA 3 GPUs - such as the excellent Radeon RX 7900 XTX.

It's also confirmed dual-issue will continue in RDNA4.

WMMA is done using the AI Accelerators.
From my understanding, the below image shows how the AI Accelerators work by utilizing the SIMD32.
G3dJcdl.jpeg


As seen above in the dual SIMD32, one SIMD32 can do Float or Int, while the other can do only Float.

With dual-issue, the Float/Int SIMD32 would be utilized by the AI Accelerators and the Float only SIMD32 would be utilized for normal utilization.

This would explain why Sony shows the PS5 Pro as having 16.7 TF.

I'll add some important context here about RDNA 3's AI cores, as I know there was a lot of confusion regard this a while back.

Just on a side note, I had a discussion with someone who specialises in graphics, hardware and programming on a professional level (I don't want to go into more detail than that). I asked him about the hardware acceleration for AI on RDNA 3, as i know there's been several debates regarding this subject on Gaf, some have argued its a fully dedicated AI core like Tensor cores in Nvidia cards, others have argued it's repurposed shader units designed to run ML code.

Anyways this was his response : "It has ML Acceleration HW, but it is directly integrated into the stream processors, so it can't run at the same time as regular FMA code. This differs compared to Nvidia Tensor Cores, which can run independent but require the same scheduler as the other pipelines, so it can't issue regular FMA instructions at the same time. This also differs from a Neural Processing Unit, which is near fully independent, and only the main CPU thread needs to pass an independent thread to it."

Make of that what you will, but it should give us an interesting insight into some of the potential AI capabilities of the Pro in regards to potential upsampling technology.
 

Arioco

Member
People focus too much on the 1060 he used, but the reality is that at the 120Hz mode, with 1080p or lower and with low detail settings, with crossgen titles, it becomes a CPU test.
It would not matter much if he used a 1060 or a 3090. Most often than not the CPU would be the limiting factor.
So it becomes a comparison between the 3300X and the PS5 CPU.
The 300X does have fewer cores, but has higher clocks and much more L3 cache per CCD.
And this shows, as the 3300X gest better 1% lows.


Not to mention PS5 version does NOT have a 1080p mode without RT, the only mode that runs at 1080p is the RT performance mode and all other modes run at much higher resolution, so I never understood how this comparison could've been made at 1080p and matched setting as Gamers Nexus claimed. It's just not possible. Did I miss something? PS5 dropping to 34 fps at 1080p? How can a GTX 1060 beat a PS5 by a landslide? The difference is huge, when we all know a 1060 is not even close to a PS5.
 

SonGoku

Member


At 3:42, "We added custom hardware for machine learning."



At 1:03, "Have an actual custom silicon AI upscaler."

As far as we know, they can't be any clearer.

That wording does not necessary mean separate NPU, both custom hardware and "actual custom silicon" can apply to custom AI blocks integrated within the GPU
It doesn't make much sense to include a separate NPU from a die area perspective. Its more efficient to include the custom AI units within the GPU block rather than a separate block. Specially with Sony who make their chips as small as possible i dont see them making a NPU, when GPU AI units are much more area efficient and produce greater TOPs as a result
There's not stoping a small block accounting for some of the tops performance.

AMD combines the AI of their GPU and AI engine on strixpoint.
Ask yourself what would the benefit of including a separate NPU block be? Why should they invest die space in that?
When you can get better results by integrating the AI units/Tensor Cores inside the main GPU block

With the die space required for a separate NPU block, they could add more CUs & AI units, more cache
 
Last edited:

shamoomoo

Banned
That wording does not necessary mean separate NPU, both custom hardware and "actual custom silicon" can apply to custom AI blocks integrated within the GPU
It doesn't make much sense to include a separate NPU from a die area perspective. Its more efficient to include the custom AI units within the GPU block rather than a separate block. Specially with Sony who make their chips as small as possible i dont see them making a NPU, when GPU AI units are much more area efficient and produce greater TOPs as a result

Ask yourself what would the benefit of including a separate NPU block be? Why should they invest die space in that?
When you can get better results by integrating the AI units/Tensor Cores inside the main GPU block

With the die space required for a separate NPU block, they could add more CUs & AI units, more cache
If I'm not mistaken, part of the leak suggested there was a NPU block on the Pro. I can't say a care either way. Also,if the NPU block is small enough to be a non-issue then I don't see why it can't be included on the pro, because the theoretical TOPs of the Pro is more than the 7900 XTX, that only makes with an NPU+ the GPU.
 
Last edited:

Lysandros

Member
That wording does not necessary mean separate NPU, both custom hardware and "actual custom silicon" can apply to custom AI blocks integrated within the GPU
It doesn't make much sense to include a separate NPU from a die area perspective. Its more efficient to include the custom AI units within the GPU block rather than a separate block. Specially with Sony who make their chips as small as possible i dont see them making a NPU, when GPU AI units are much more area efficient and produce greater TOPs as a result

Ask yourself what would the benefit of including a separate NPU block be? Why should they invest die space in that?
When you can get better results by integrating the AI units/Tensor Cores inside the main GPU block

With the die space required for a separate NPU block, they could add more CUs & AI units, more cache
Singular "An actual custom silicon AI upscaler that performs the upscaling, anti aliasing and frees up a lot of the GPU to render pure graphics" sounds like "AI units within CUs to you? It wouldn't be much difficult to word it that way if he is referring to it don't you think? I am not saying CUs themselves don't have ML capabilities via VOPD+Sparsity/lower precision at all by the way but neither of those features are "custom" to PS5 PRO. I am only sticking to statements coming from actual Sony sources. They can design and integrate things like the hardware ID buffer, Cache Scrubbers, the Tempest Engine, an entire I/O complex within the APU just fine regardless of die space implications, so why a dedicated custom ML block becomes such an unfathomable sacrilege all of a sudden?
 
Last edited:

PaintTinJr

Member
If PSSR uses 2ms of GPU time, it can't be a separate AI block, can it? Although this information about the 2ms comes from a rumor, I don't remember who from.
It is a pretty logical limit given that PSSR has examples working at 120Hz, so you've only got 8.33ms total for the entire frame rendering @120Hz including PSSR, so PSSR taking more than 20% of a frame makes little sense IMHO.
 

Lysandros

Member
If PSSR uses 2ms of GPU time, it can't be a separate AI block, can it? Although this information about the 2ms comes from a rumor, I don't remember who from.
Sure it can. DLSS has also a GPU cost in ms despite tensor cores. Those dedicated components help but don't make the reconstruction free. The quote says "it frees up a lot of the GPU to render pure graphics" not all.
 
Last edited:

winjer

Gold Member
NPU 300tops, would take a lot die area. RDNA4 if their final clocks 3.46ghz, would be 450+ tops

Exactly. Even the newest AMD AI9 HX 370, only has an NPU with 50 TOPs.
But of course AMD inflates this number to 80 TOPs, by considering the WMMA units in the GPU.
I don't think Sony has yet disclosed the numbers for the TOPs that the Pro can do. But if the leak of 300 TOPs is real, then it's probably by using WMMA instructions available on all the shader units on the GPU.
 

Lysandros

Member
Exactly. Even the newest AMD AI9 HX 370, only has an NPU with 50 TOPs.
But of course AMD inflates this number to 80 TOPs, by considering the WMMA units in the GPU.
I don't think Sony has yet disclosed the numbers for the TOPs that the Pro can do. But if the leak of 300 TOPs is real, then it's probably by using WMMA instructions available on all the shader units on the GPU.
Why do we assume that the dedicated block in itself has to be 300 TOPs? Can't it be WMMA+ML unit? How much die space a ~35 TOPs unit would take?
 
Last edited:

winjer

Gold Member
Why do we assume that the dedicated block in itself has to be 300 TOPs? Can't it be WMMA+ML unit? How much die space a ~35 TOPs unit would take?

That could also be an option.
The NPU in the AI H300 series is a block from CDNA2. And it could be used as well on the Pro.

I really wish Cerny had made a more detailed presentation of the Pro, so we didn't have to be speculating so much about it.
 
I am not saying that PS5 PRO doesn't have SWMMAC at all. I am only sticking to the official statements. By "custom hardware for ML" or "an actual custom silicon for AI upscaling" i understand a custom/specific hardware component akin to PS4 PRO's hardware ID buffer or PS5's Cache Scrubbers. Sony/Cerny didn't call PS4 PRO's RPM feature "custom" for a reason, because this was a standard feature AMD's next/future architecture (Vega).
By using SWMMAC PS5 Pro will easily get near 300 TOPs (likely a rounded up number). when you look at RDNA4 SWMMAC you'll see there are plenty of different instructions available on top of RDNA3 and RDNA3.5 ML stuff, including INT8 ones, but plenty others. Sony has probably picked the instruction(s) needed for PSSR and discarded the rest. SWMMAC has been developed for performance and better efficiency and is specifically designed to run on RDNA4. Sony is more than likely using the IP already tested and ready for them. Remember that they also supposedly use RT units from RDNA4. No need to reinvent the wheel.

Similarly Sony use a custom Zen 2 CPU on PS5 with highly modified FPU units. It's still Zen 2, but without stuff not needed in most gaming applications. Technically it's a custom CPU silicon only used on PS5 (and PS5 Pro).
 
Last edited:
Top Bottom