• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS5 Pro devkits arrive at third-party studios, Sony expects Pro specs to leak

rnlval

Member
Possibly.

Although, PS5-architecture is completely different from PS4 Pro, if I'm not mistaken.
PS4 Pro was nothing but an extra GPU and a bit of extra RAM.

I'm just wondering if these dev kits are actual PS5 Pro devkits, or something in the works for next-gen.
RDNA 1 and RDNA 2 CUs have GCN's Wave64 instruction set backward compatibility. PS5's 36 CU has strict hardware backward compatibility with PS4 Pro's 36 CU and PS4's 18 CU.

RDNA 3 CU has GCN's Wave64 instruction set backward compatibility but dual issue mode is only for Wave32 instruction set. RDNA 3 CU's dual-issue mode is effectively 128 stream processors.
 
Last edited:

winjer

Gold Member
NVIDIA reveals Turing RTX, Ampere, and ADA RT TFLOPS vs shader TFLOPS ratio, hence the major reason for AMD's RT is inferior.

xwuc5zm.jpg


According to Microsoft's Xbox Series X, RDNA 2 has shader TFLOPS nearly matching RT TFLOPS i.e. 12 TFLOPS shader with 13 TFLOPS raytracing.

RDNA 3 CU has a 1.5X RT instructions in flight increase.

AMD needs to substantially increase RT's raw TFLOPS performance. AMD needs to treat RT seriously when hardware RT affects professional apps and gaming use cases. Hint: There's NO mobile Radeon RX 7000 series shown on 2024-era laptops during CES 2024.

That RT TFLOP comparison is so superficial that it's almost non-sense.
As it has been stated in several threads, the lacking performance in RT for AMD, is due to several issues.
One is the lack of dedicated units for managing and traversal of the BVH structure. Another is that RT is done in the TMUs.
And probably the worst, is that work wave occupancy with RT loads, is rather low in RDNA2.
 

rnlval

Member
That RT TFLOP comparison is so superficial that it's almost non-sense.
As it has been stated in several threads, the lacking performance in RT for AMD, is due to several issues.
One is the lack of dedicated units for managing and traversal of the BVH structure. Another is that RT is done in the TMUs.
And probably the worst, is that work wave occupancy with RT loads, is rather low in RDNA2.
FYI, Turing RT cores are next to texture units. https://developer.nvidia.com/blog/nvidia-turing-architecture-in-depth/

BVH data sets are geometry.
 

winjer

Gold Member
FYI, Turing RT cores are next to texture units. https://developer.nvidia.com/blog/nvidia-turing-architecture-in-depth/

BVH data sets are geometry.

Being near something does not mean it's the same unit. Turing has dedicated units for ray-tracing. Both for BVH traversal and for for ray and triangle intersection testing.
On RDNA2, the ray and triangle intersection tests are done in the TMUs.

BVH structures are data sets, distributed in a tree. They are not geometry, although they have volumes encompassing geometry.
 
Being near something does not mean it's the same unit. Turing has dedicated units for ray-tracing. Both for BVH traversal and for for ray and triangle intersection testing.
On RDNA2, the ray and triangle intersection tests are done in the TMUs.

BVH structures are data sets, distributed in a tree. They are not geometry, although they have volumes encompassing geometry.
They can be used to replace geometry in some cases. It's used in Spider-man 2 buildings interiors. In the future we can expect they'll be more and more used instead of geometry.
 

winjer

Gold Member
They can be used to replace geometry in some cases. It's used in Spider-man 2 buildings interiors. In the future we can expect they'll be more and more used instead of geometry.

OMG, that is the non-sense from Digital Foundry. A BVH is just a data structure. Something like this.
The BVH encompasses geometry, and divides it in a a data structure, but it's not the geometry.

j4fOBuv.png
 

rnlval

Member
That RT TFLOP comparison is so superficial that it's almost non-sense.
As it has been stated in several threads, the lacking performance in RT for AMD, is due to several issues.
One is the lack of dedicated units for managing and traversal of the BVH structure. Another is that RT is done in the TMUs.
And probably the worst, is that work wave occupancy with RT loads, is rather low in RDNA2.
Radeon 7900 XTX's 61 TFLOPS shaders,

applying up to 1.5X RT improvement on XSX's 1.08X ratio land on 99 TFLOPS.
applying real-world 1.3X RT improvement on XSX's 1.08X ratio land on 85.9 TFLOPS.

7900 XTX's estimated RT TFLOPS are in the RTX 4070 range.
 

rnlval

Member
Being near something does not mean it's the same unit. Turing has dedicated units for ray-tracing. Both for BVH traversal and for for ray and triangle intersection testing.
On RDNA2, the ray and triangle intersection tests are done in the TMUs.

BVH structures are data sets, distributed in a tree. They are not geometry, although they have volumes encompassing geometry.
The bound box is an approximation of the geometry mass subset. Ray intersects the triangle test have geometry data.

WyPL7AS.jpg


RDNA 2's RT cores are implemented next to texture units.

On this 7900 XTX's Hog Warts example,

UhiG7KW.jpg


BVH's transversal is a major factor.
 
Last edited:

winjer

Gold Member
Radeon 7900 XTX's 61 TFLOPS shaders,

applying up to 1.5X RT improvement on XSX's 1.08X ratio land on 99 TFLOPS.
applying real-world 1.3X RT improvement on XSX's 1.08X ratio land on 85.9 TFLOPS.

7900 XTX's estimated RT TFLOPS are in the RTX 4070 range.

So many problems here.
First, RNDA3 dual compute units are used in pretty much no games so far. And even if it were, it would never be at full peak theoretical ocuppancy.

I don´t know were you got the 1.5X and 1.05x, but the ray and triangle intersection testing is done in the TMUs, not the shaders.
So the scaling has to be done in relation to the amount of TMUs, not the shader or CU count.

AMD and NVidia have different ways of showing their RT numbers. So there is no direct comparison in RT TFLOPs that can be made between the two.
And this is even worse, when we consider that RDNA3 has worse warp/work wave occupancy, than Ada Lovelace.
 

winjer

Gold Member
The bound box is an approximation of the geometry mass subset. Ray intersects the triangle test have geometry data.

BVH: Bounding Volume Hierarchy. An "acceleration structure" for ray tracing. Basically a data structure which allows the engine to check quickly what objects a ray (or a bullet) hits.

WyPL7AS.jpg


RDNA 2's RT cores are implemented next to texture units.

Dude, you have the Ray Accelerator right inside the TMU, next to the Texture Filter Units and the Mapping Units.
 

rnlval

Member
So many problems here.
First, RNDA3 dual compute units are used in pretty much no games so far. And even if it were, it would never be at full peak theoretical ocuppancy.

I don´t know were you got the 1.5X and 1.05x, but the ray and triangle intersection testing is done in the TMUs, not the shaders.
So the scaling has to be done in relation to the amount of TMUs, not the shader or CU count.

AMD and NVidia have different ways of showing their RT numbers. So there is no direct comparison in RT TFLOPs that can be made between the two.
And this is even worse, when we consider that RDNA3 has worse warp/work wave occupancy, than Ada Lovelace.
This is wrong.

gDm8R2R.jpg


S25jEiV.jpg

RDNA 2's ray accelerator unit implementation is next to texture units.

The real issue is I/O bandwidth with the lowest latency SRAM storage.
 
Last edited:

rnlval

Member
BVH: Bounding Volume Hierarchy. An "acceleration structure" for ray tracing. Basically a data structure which allows the engine to check quickly what objects a ray (or a bullet) hits.



Dude, you have the Ray Accelerator right inside the TMU, next to the Texture Filter Units and the Mapping Units.

Bounding box test example
pSaae5D.jpg


The bound box test is an approximation of the geometry mass subset.
 

winjer

Gold Member
This is wrong.

gDm8R2R.jpg


S25jEiV.jpg

RDNA 2's ray accelerator unit implementation is next to texture units.

Here is a deep dive, from an engineer, that explains exactly how RDNA 2 and 3 do RT.


AMD RDNA 2 and RDNA 3​

AMD implements raytracing acceleration by adding intersection test instructions to the texture units. Instead of dealing with textures though, these instructions take a box or triangle node in a predefined format. Box nodes can represent four boxes, and triangle nodes can represent four triangles. The instruction computes intersection test results for everything in that node, and hands the results back to the shader. Then, the shader is responsible for traversing the BVH and handing the next node to the texture units. RDNA 3 additionally has specialized LDS instructions to make managing the traversal stack faster.
 

winjer

Gold Member
Bounding box test example
pSaae5D.jpg


The bound box test is an approximation of the geometry mass subset.

The bounding volume is just a target volume that encompasses a lot of geometry.
The part that really matters is the hierarchy, as this is the part that will accelerate the Ray Tracing part, by sending the rays to the proper level, so it can hit the correct triangles.
And this is a data structure. Not a geometric structure.

When we talk about acceleration of a BVH, we are talking about data trees:

 

rnlval

Member
Here is a deep dive, from an engineer, that explains exactly how RDNA 2 and 3 do RT.

LOL. My 7900xtx_hogwarts_indirect_raytracing example is from https://chipsandcheese.com/2023/03/22/raytracing-on-amds-rdna-2-3-and-nvidias-turing-and-pascal

If you read https://chipsandcheese.com/2023/03/22/raytracing-on-amds-rdna-2-3-and-nvidias-turing-and-pascal/

"In the RTX 2060 Mobile’s case, L2 latency is around 120 to 143 ns, depending on whether you’re going through the TMUs".
 
Last edited:

winjer

Gold Member
LOL. My 7900xtx_hogwarts_indirect_raytracing example is from https://chipsandcheese.com/2023/03/22/raytracing-on-amds-rdna-2-3-and-nvidias-turing-and-pascal

If you read https://chipsandcheese.com/2023/03/22/raytracing-on-amds-rdna-2-3-and-nvidias-turing-and-pascal/

"In the RTX 2060 Mobile’s case, L2 latency is around 120 to 143 ns, depending on whether you’re going through the TMUs".

That is talking about how the data can be shuffled around in the GPU. Because data has to be transferred to different units, as it's operated on.
What that means, is that measuring L2 latency going through the TMU, is higher on Turing. But it does not mean it's the TMU that is doing the ray and triangle testing.

BTW, here is an explanation from the NVidia, about what is a BVH and how to manage it.

 
Last edited:

rnlval

Member
The bounding volume is just a target volume that encompasses a lot of geometry.
The part that really matters is the hierarchy, as this is the part that will accelerate the Ray Tracing part, by sending the rays to the proper level, so it can hit the correct triangles.
And this is a data structure. Not a geometric structure.

When we talk about acceleration of a BVH, we are talking about data trees:

The BVH tree is a data structure organization and the bound box test is an approximation of the geometry mass subset before drilling down to the ray intersect triangle triangles test.
 

rnlval

Member
That is talking about how the data can be shuffled around in the GPU. Because data has to be transferred to different units, as it's operated on.
What that means, is that measuring L2 latency going through the TMU, is higher on Turing. But it does not mean it's the TMU that is doing the ray and triangle testing.

BTW, here is an explanation from the NVidia, about what is a BVH and how to manage it.

The priority should be AMD's source, not 3rd party clam chowder.

AMD claims ray accelerators are implemented as separate units next to texture units.
 

winjer

Gold Member
The BVH tree is a data structure organization and the bound box test is an approximation of the geometry mass subset before drilling down to the ray intersect triangle triangles test.

The bounding volume is a very coarse entity, that cannot be used to render geometry.
It's only there to set bounds on a group of geometry, to be tested.

BTW, if you still have doubts that Turing RT cores process the BVH Traversal, and not the TMUs, here is NVidia's presentation:


The RT Cores in Turing can process all the BVH traversal and ray-triangle intersection testing, saving the SM from spending the thousands of instruction slots per ray, which could be an enormous amount of instructions for an entire scene. The RT Core includes two specialized units. The first unit does bounding box tests, and the second unit does ray-triangle intersection tests. The SM only has to launch a ray probe, and the RT core does the BVH traversal and ray-triangle tests, and return a hit or no hit to the SM. The SM is largely freed up to do other graphics or compute work. See Figure 18 or an illustration of Turing ray tracing with RT Cores.
 

rnlval

Member
That is talking about how the data can be shuffled around in the GPU. Because data has to be transferred to different units, as it's operated on.
What that means, is that measuring L2 latency going through the TMU, is higher on Turing. But it does not mean it's the TMU that is doing the ray and triangle testing.

BTW, here is an explanation from the NVidia, about what is a BVH and how to manage it.

From https://i0.wp.com/chipsandcheese.com/wp-content/uploads/2023/02/rdna2.drawio-1.png?ssl=1

This is from clam chowder

4YugkzA.png

Notice both clam chowder's Ampere and RDNA 2 have "TMU / RT" box.
 

winjer

Gold Member
The priority should be AMD's source, not 3rd party clam chowder.

AMD claims ray accelerators are implemented as separate units next to texture units.

AMD shows the Ray-Accelerator inside the TMU.

And here is AMD's patent for using RT in the Texture Units:

 

winjer

Gold Member

rnlval

Member
You have NVidia clearly stating that both the ray and triangle testing and BVH traversal are done in a dedicated RT core, and yet you claim that NVidia is wrong.
While at the same time, for some reason you claim that AMD's RT is not done in the TMUs....
You linked clam chowder's web page before my post when I knew AMD's actual RDNA 2 presentation claims otherwise i.e. AMD added "ray accelerator" units for RDNA 2.

Additional TMUs can be added and why not extra modified TMUs known as ray accelerator units?

ALU SP and TMU ratio has changed in the past Radeon HD series.
 
Last edited:

winjer

Gold Member
You linked clam chowder's web page before my post when I knew AMD's actual RDNA 2 presentation claims otherwise i.e. AMD added "ray accelerator" units for RDNA 2.

Additional TMUs can be added and why not extra modified TMUs known as ray accelerator units?

ALU SP and TMU ratio has changed in the past Radeon HD series.

My point is, the Ray Accelerators are in the TMUs both for RDNA2 and RDNA3.
And in Turing, Ampere and Ada, they are in a dedicated RT unit.

Here is RDNA3 ISA, where one might notice that there are a lot of RT instructions, in the section "10.9.3. Texture Resource Definition"
Which is a part of 10.9. "Ray Tracing"

 

rnlval

Member
AMD shows the Ray-Accelerator inside the TMU.

And here is AMD's patent for using RT in the Texture Units:

That's meaningless when AMD's actual RDNA 2 presentation has additional "ray accelerators" units. AMD has changed stream processors vs TMU ratios in the past Radeon HD series.

The reason why I didn't post Clamchowder's web link is due to conflicts with AMD's official RDNA 2 presentation.
 

rnlval

Member
My point is, the Ray Accelerators are in the TMUs both for RDNA2 and RDNA3.
And in Turing, Ampere and Ada, they are in a dedicated RT unit.

Here is RDNA3 ISA, where one might notice that there are a lot of RT instructions, in the section "10.9.3. Texture Resource Definition"
Which is a part of 10.9. "Ray Tracing"

Are you claiming CU's TMU vs stream processor ratios never change?
 

winjer

Gold Member
That's meaningless when AMD's actual RDNA 2 presentation has additional "ray accelerators" units. AMD has changed stream processors vs TMU ratios in the past Radeon HD series.

The reason why I didn't post Clamchowder's web link is due to conflicts with AMD's official RDNA 2 presentation.

WTF, you have the deep ISA instruction manual, published by AMD, saying it's using Texture Resources, yet you still insist the Ray Accelerators are not in the TMUS.
There is no more concrete evidence, that AMD, themselves, saying it. And they say this both for the RDNA2 ISA and the RDNA3 ISA.
 

rnlval

Member
WTF, you have the deep ISA instruction manual, published by AMD, saying it's using Texture Resources, yet you still insist the Ray Accelerators are not in the TMUS.
There is no more concrete evidence, that AMD, themselves, saying it. And they say this both for the RDNA2 ISA and the RDNA3 ISA.
Meaningless. Radeon CU's TMU scaling is not static.

AMD's RDNA 3's CU presentation with distinct ray acceleration blocks. Are you claiming AMD's presentation is untrue?

KCqq7Fs.jpg
 
Last edited:

winjer

Gold Member
Meaningless. Radeon CU's TMU scaling is not static.

AMD can scale TMUs, CUs, RA and whatever they want as they please.
The fact remains, that AMD does the ray intersection testing in the TMUs and the BVH on the shaders. While NVidia has full dedicated RT cores to do both.
 

rnlval

Member
AMD can scale TMUs, CUs, RA and whatever they want as they please.
The fact remains, that AMD does the ray intersection testing in the TMUs and the BVH on the shaders. While NVidia has full dedicated RT cores to do both.
Who's fact? Clam Chowder?

Using RT cores in Blender 3D is an extreme RT use case example and it doesn't budget for real-time RT considerations.
 
Last edited:

Zathalus

Member
They can be used to replace geometry in some cases. It's used in Spider-man 2 buildings interiors. In the future we can expect they'll be more and more used instead of geometry.

OMG, that is the non-sense from Digital Foundry. A BVH is just a data structure. Something like this.
The BVH encompasses geometry, and divides it in a a data structure, but it's not the geometry.

j4fOBuv.png
Digital Foundry were correct that RT generates the rooms in the buildings, but it's achieved by a bit of digital trickery. Rooms have been created under the city and based on the ID of the window hit by rays a interior room is then reflected back out. So the geometry still has to be manually created somewhere else but it can be displayed onscreen due to the really clever use use of RT. Starting at 2:47:

 

winjer

Gold Member
Digital Foundry were correct that RT generates the rooms in the buildings, but it's achieved by a bit of digital trickery. Rooms have been created under the city and based on the ID of the window hit by rays a interior room is then reflected back out. So the geometry still has to be manually created somewhere else but it can be displayed onscreen due to the really clever use use of RT. Starting at 2:47:



That is not the point I was making, but that the BVH is a data structure. Not geometry.
What is being reflected with RT is normal generated geometry.
The BVH is just the data set that accelerates ray tracing, by defining where rays are cast.
 

Zathalus

Member
That is not the point I was making, but that the BVH is a data structure. Not geometry.
What is being reflected with RT is normal generated geometry.
The BVH is just the data set that accelerates ray tracing, by defining where rays are cast.
Just clearing up the misconception regarding the Spider-Man thing. The building interiors were done by RT but it cannot replace geometry as it was just a bit of clever trickery.
 

winjer

Gold Member
Just clearing up the misconception regarding the Spider-Man thing. The building interiors were done by RT but it cannot replace geometry as it was just a bit of clever trickery.

I understand that.
But we have to clarify that DF is constantly making the mistaking of saying that the BVH is geometry, when it's a data structure.
The Bounding Volume is closer in concept to a voxel. Not to geometry with primitives.
 
Last edited:

rnlval

Member
AMD shows the Ray-Accelerator inside the TMU.

And here is AMD's patent for using RT in the Texture Units:


That's AMD's original 2019 patent.


AMD's US20200193685 is the recent patent that came out in June 2020.
 

onQ123

Member
When actual specs get leaked if no one else does I am making a new thread because I rarely look in here for a meaningful bump
Did you hear anything about Volume Rendering for Next-Gen?

It's time for true 3D or 4D if they can work melting ice & plant growth into gameplay somehow

( They're not going to stop arguing about Ray tracing are they? Lol)
 
Top Bottom