• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Next-Gen PS5 & XSX |OT| Console tEch threaD

Status
Not open for further replies.

CrysisFreak

Banned
EVTS5RyUwAIwzrx
Insane how much time and effort people invest in DualSense mockups lol
 
So, what is the concensis of clockrate increasing performance beyond a lower clocked gpu which has the same tflop number?

People like to use this as an explanation for why the ps5s GPU will punch above its weight.

Cernys example quoted a significantly lower clocked gpu, i wonder what the sweet spot is for cus and clockspeed.

I really dont believe that a 36 cu @ 2.23ghz will have more performance then a gpu of 44cus @ 1826mhz.

In certain tasks it will, but in other tasks it won't. Basically, the smaller chip clocked higher can do more batches of smaller operations per second sequentially, while the larger chip clocked smaller will do fewer batches per second, but each batch will be bigger (so it can do more per second in parallel).

So you can thing of the smaller chip as a highway with fewer lanes but the speed limit is higher, and the larger chip as a different highway with more lanes but a lower speed limit. In the case of PS5 and XSX, both highways are trying to reach a destination so the main question is which approach works out better for the given destination. Some destinations will favor PS5's approach and others will favor XSX's approach.

That's one way to look at it 🤷‍♂️
 
The hardware and software involved according to Microsoft are
  1. NVMe SSD,
  2. a dedicated hardware decompression block,
  3. the all new DirectStorage API,
  4. and Sampler Feedback Streaming (SFS).
No i won't ignore you but you are really committed to calling me a fanboy. Lol

I'm really not. But this board has definite fanboy overtones.

Yes those are the elements of the XVA. How they work together to enable the Vmem (i think spencer calls it extended ram) is NOT currently known.

Again the best guesses by everyone including DF is that this
I think they knew what they were doing when they oversimplified the explanation like that. Just like adding a 13TF estimate on top of 12 when talking about ray tracing, and that wasn't for clarity.

There are lots of points made like this. For example the xsx audio block takes weight off of the equivalent of 5 zen cores according to MS. Sony says theirs replaces the equivalent of 10...

We don't have any clue about the raytracing capabilities of either device. It just seems like some things people take as gospel with extrapolation and infer that the other console is lying when they state similar capabilities.

I cant be the only person seeing this.
 

Sosokrates

Report me if I continue to console war
In certain tasks it will, but in other tasks it won't. Basically, the smaller chip clocked higher can do more batches of smaller operations per second sequentially, while the larger chip clocked smaller will do fewer batches per second, but each batch will be bigger (so it can do more per second in parallel).

So you can thing of the smaller chip as a highway with fewer lanes but the speed limit is higher, and the larger chip as a different highway with more lanes but a lower speed limit. In the case of PS5 and XSX, both highways are trying to reach a destination so the main question is which approach works out better for the given destination. Some destinations will favor PS5's approach and others will favor XSX's approach.

That's one way to look at it 🤷‍♂️

Then why in pc gpus do we not see higher clocked gpus perform better then lower clocked ones.
 
I get what you are saying, but that doesn’t really make sense based on where Xbox is at – going into next-gen against a backdrop of PS4 and Switch sales success.
If they’ve got a numbers advantage on anything, then as the challenger for market dominance they would have been singing about it already(IMHO) – especially when considering the discourse revelations that would dovetailed with such Xbox positivity, and would double down on the message that their console will be running a 12 TF workload all the time and PS5 will occasionally be at 10.3 TF.

(AFAIK) They aren’t so interested in convincing the 30+ gamer that is into GAF, Era, DF, etc doing extensive scrutiny of the hardware/software strategy. They seem more interested in setting the minds of game shop employees (early doors before things are cleared up), as their personal opinions land the whales (school kids) on a platform IMO and control the future landscape of console brands.

I cant say that I agree with that position. MS has been much more open about the components of their system than Sony to date.

Showing off what looks like finish hardware, showing teardowns and assembly.

What they haven't done is gdc level talk and certainly haven't published it if one exists.
 

NorthStar

Neo Member
In certain tasks it will, but in other tasks it won't. Basically, the smaller chip clocked higher can do more batches of smaller operations per second sequentially, while the larger chip clocked smaller will do fewer batches per second, but each batch will be bigger (so it can do more per second in parallel).

So you can thing of the smaller chip as a highway with fewer lanes but the speed limit is higher, and the larger chip as a different highway with more lanes but a lower speed limit. In the case of PS5 and XSX, both highways are trying to reach a destination so the main question is which approach works out better for the given destination. Some destinations will favor PS5's approach and others will favor XSX's approach.

That's one way to look at it 🤷‍♂️
If the RDNA 2 chips does allow for a greater at higher frequency performance as Cerny stated that it would be ''greater than Linear performance'' which according to rumours stating that the new RDNA 2 could be clocked above the PS5 clocks then why would MS no have gone for the higher frequency on the XBX?

Seeing as it also made with the same RDNA 2 chip as the PS5 then they would be leaving quite a bit of performance on the table unless their architecture for some reason cannot support it?
 
Last edited:
Then why in pc gpus do we not see higher clocked gpus perform better then lower clocked ones.

The work a GPU is tasked with is highly parallel in nature. So much so, that I've never seen a 5yr old game have a problem saturating a new GPU, even though GPUs available at the time of release were much narrower. However, never underestimate the power of the PR department in trying to spin any weaknesses, that goes for Sony and MS.
 

SonGoku

Member
So, what is the concensis of clockrate increasing performance beyond a lower clocked gpu which has the same tflop number?
If its the exact same GPU architecture, there are two advantages
  1. Other parts of the GPU pipeline perform better rops, caches, geometry processor, aces etc. IF XSX has the same frontend, PS5 could have an advantage but that wouldn't be enough to close the gap.
  2. The smaller GPU will reach higher VALU utilization (closer to peak)
Cernys example quoted a significantly lower clocked gpu, i wonder what the sweet spot is for cus and clockspeed.
Consoles aim for the sweet spot, no point in increasing costs for negligent gains
I really dont believe that a 36 cu @ 2.23ghz will have more performance then a gpu of 44cus @ 1826mhz.
XSX will remain on top, PS5 cant go beyond its established computational limits, it will just get closer to peak (higher VALU utilization) compared to XSX
 
Last edited:
The 5700 @ 2150mhz and a stock 5700xt perform about the same. According to cerny the higher clocked 5700 should perform better in this situation.

The 5700 can't sustain the 2150 throughout most of the test. The 5700XT can rarely sustain its own boost, even in perfect airflow situations. It's hard to make comparisons based on those tests.
 
Last edited:

xool

Member
There are lots of points made like this. For example the xsx audio block takes weight off of the equivalent of 5 zen cores according to MS. Sony says theirs replaces the equivalent of 10...

Both teams have marketing BS on overdrive. Like we start with 8 cores, lose 1 to OS, then apparently the audio would use 5 or 10 leaving us with 2 or -3 cores left. In this exaggerated world you need an i9 to play minesweeper.

We also have the fanboys to run with that info and take marketing numbers as gospel.
 

SonGoku

Member
The 5700 @ 2150mhz and a stock 5700xt perform about the same. According to cerny the higher clocked 5700 should perform better in this situation.
5700 is hitting disminishing returns on clocks way beyond its sweetspot for power delivery, logic timing, bandwidth etc.
A proper test would eliminate bottlenecks by using lower clocks
I think so. These use "Linux" BSD which afaik uses decimal GB numbers generally for file listings etc. Windows is different and has always used the bigger binary GB..
So PS5 consumers will see the full 825GB sans OS allocation ?
 
That's my understanding as well, but in their rush to give catchy names to things and not only that alluded to 100GB of assets rather than saying that the entire SSD can be accessed very fast they just caused confusion hence this conversation i'm having here.

Well, compared to PCs and current-gen systems yes the entire drive can be accessed very fast, it only seems slow in relation to PS5's SSD is all. Its just that the 100 GB partition will be (likely) specifically set up for specific game-related texture and data streaming.


Then why in pc gpus do we not see higher clocked gpus perform better then lower clocked ones.

Well, for one it's mainly down to the fact that most games don't target specific GPU cards, ad don't optimize to scaling game engine performance with high clock frequencies. The higher clocks basically "brute force" the performance gains (and generally not very well in relation to the amount of power required to get the marginal performance gains). That's a problem with RDNA1 cards, for example: once you move past the tip of the sweetspot ( 🤤 ), the ratio of power needed for what performance gains you get becomes incredibly lopsided.

That's probably been addressed somewhat with RDNA2 but by what amount is unknown, let alone on what process node (RDNA2 supports more than one). What you'll see a lot on PCs is, when the two GPUs are identical in size and base performance, the higher clocked one DOES perform better than the lower-clocked one. But like SportsFan581 SportsFan581 kind of mentioned, GPUs are extremely parallel in their nature (they're basically a super cluster of graphics-oriented DSPs, in fact they're one of the reasons DSPs aren't as widely used in consoles anymore), so it's not really that difficult to saturate large GPUs with tasks since that's kind of what GPUs specialize in.

If the RDNA 2 chips does allow for a greater for higher frequency performance as Cerny stated that it would be ''greater than Linear performance'' which according to rumours stating that it could be clocked above the PS5 clocks then why would MS no have gone for the higher frequency on the XBX?

Seeing as it also made with the same RDNA 2 chip as the PS5 then they would be leaving quite a bit of performance on the table unless their architecture for some reason cannot support it?

Because I actually have to take Cerny's words with a pinch of salt there. You can't actually get greater-than-linear performance because that would mean you're using some other silicon altogether. Or, another way to interpret that line, is him trying to imply they need less power draw than you'd need in the sweetspot or below the sweetspot, once you push beyond the sweetspot. That's just not possible.

Once you move beyond the sweetspot, you get worst-than-linear scaling relative to the ratio in the sweetspot, otherwise there wouldn't be a reason to specify a sweetspot in the first place. Cerny's words there could be in reference to their cooling solution, however, being designed in such a way to offset a lot of the extra heat and power generated when they hit those higher clocks, maybe allowing them to push higher (at the cost of more expensive cooling, of course). And it could also be partly in reference to AMD's SmartShift technology, too.

As for why MS didn't clock higher? Well, they could've, but then they might've needed a variable frequency approach, especially since their chip is larger. And the cooling system in XSX is already pretty big for what it is. MS probably wanted stable locked clocked over variable ones; seeing how they also want to use XSX in server blades, stability is paramount in that case and would be a considerable influencing factor.
 
Well, compared to PCs and current-gen systems yes the entire drive can be accessed very fast, it only seems slow in relation to PS5's SSD is all. Its just that the 100 GB partition will be (likely) specifically set up for specific game-related texture and data streaming.

I'm hoping there is no 100GB partition for streaming, that sounds like a drive killer (similar to an SSD cache on a SQL server). Creating a lot of unnecessary writes, copying data from one part of the drive to another. I'm hoping the games are packaged so that the install itself is accessible by the CPU/GPU directly, this would eliminate wasted writes.
 

Sosokrates

Report me if I continue to console war
So if GPU clock increase does not yield (varied) performance increase...why do manufacturers release "overlocked" versions?

Because they clock increase provide overall tflop improvement, a gpu maker has never released a gpu with the same tflops but different cu counts and clock speeds.

Its seems im not going to get a solid answer until the ps5 and rdna2 gpus come out, but i know what outcome my money is on.
 

geordiemp

Member
If the RDNA 2 chips does allow for a greater at higher frequency performance as Cerny stated that it would be ''greater than Linear performance'' which according to rumours stating that the new RDNA 2 could be clocked above the PS5 clocks then why would MS no have gone for the higher frequency on the XBX?

Seeing as it also made with the same RDNA 2 chip as the PS5 then they would be leaving quite a bit of performance on the table unless their architecture for some reason cannot support it?

MS chose 3.8 Ghz for the CPU and lower GPU, Sony Chose 3.5 Ghz and higher GPU.

Also we have not seen the sony cooling or the so called "cooling patent" they have ....
 

Bo_Hazem

Banned
Actually a toslink has a datarate of 125mbit for second, so really low, that's the reason it doesn't support modern audio codec and got replaced by HDMI long ago. Toslink is a dead technology we didn't want to abandon, and the fault lies in console makers for never offering standard solutions for headsets.

Interesting, so we could expect PS5 to ditch the optical audio as well?
 
I'm hoping there is no 100GB partition for streaming, that sounds like a drive killer (similar to an SSD cache on a SQL server). Creating a lot of unnecessary writes, copying data from one part of the drive to another. I'm hoping the games are packaged so that the install itself is accessible by the CPU/GPU directly, this would eliminate wasted writes.

I can see the point there, because any game would have to fit its data within the 100 GB partition for that direct access and streaming by the GPU, CPU etc. Like you said, that'd be a drive killer (speaking of SQL servers, at least they have the benefit of SLC NAND in their drives. I doubt either system will have even 64 GB of that in their own).

From the sound of things they seem wanting to cut down on unnecessary file installs, redundant data, etc., plus if the ML and AI features work out for textures in the vein of DLSS, games might not need to default with the large texture data in their packages which will save on drive space and write cycles.

But it does ask the question, if the whole SSD can (potentially) facilitate what we've been specifying to the 100 GB partition the whole time, then what would the 100 GB partition really be used for? Couldn't just be more of the same otherwise no reason to highlight it. Guess we'll find out some time xD.
 

Bo_Hazem

Banned
Speaking of MS screwing me over - does anyone know if Series X will support USB Audio ?

Honestly, I think they'll provide better audio directly from 3.5 AUX, I can see Sony doing the same but with their exotic Tempest engine as they promised. So it might be just a dead tech after all?
 
But it does ask the question, if the whole SSD can (potentially) facilitate what we've been specifying to the 100 GB partition the whole time, then what would the 100 GB partition really be used for? Couldn't just be more of the same otherwise no reason to highlight it. Guess we'll find out some time xD.

Good point. I assume a decent amount of storage will be partitioned away from end-users for quick resume, you might need 7 or 8 GB for each suspended title (depending on how efficiently they can drop unneeded addresses from the image). We probably aren't going to know all the tricks until games get released and devs point out how things were done.
 
afaik Cerny never claimed greater than linear

He said something to the effect of "we can achieve greater than linear scaling", but trying to scan through relevant parts of the vid right now I can't find the quote. Might have to rewatch the whole thing because I can't remember when exactly it was mentioned (and if it wasn't mentioned, my mistake).

That said, just rewatching he mentions the GPU running 33% faster giving the caches more bandwidth, but this is why I take some issue with using "bandwidth" and "speed" interchangeably. From the sound of that, it would be the speed of the caches that are increased with the GPU clock increase; the actual physical amount of data the caches can hold in parallel at any one time depends on the size of the cache's physical space and that won't suddenly increase with things running faster.

Bit of a nitpick on my end, but had to mention it after just rewatching that part.
 

rnlval

Member
1. This doesn't do what you think it does, it doesn't affect or change how interleaved memory works, its mainly CPU & GPGPU oriented, it also under utilizes bandwidth and again it has no impact on interleaved memory. Nothing changes

2. It has already been pointed out to you this is not the case, only one decompression unit which handles both compression algorithms
Having two decompressors its not even good design
1. You haven't countered my arguments.

2. How are you? Disprove this.


nVl2KBP.jpg


Two automatic decompression paths for different data types are not rocket science.

Winzip has GpGPU acceleration.
 
Last edited:

Neo Blaster

Member
Absolutely. And that's what everyone seems to forget, there's a vast difference between one 10GB file and thousands of tiny 1MB files. MS and Sony obviously just as any HDD/SSD producer provided the maximum (sequential) possible value, which rarely represents the actual real-life performance. I assume the custom chips build into the consoles will help with that, but still to a certain degree.
I think they can achieve such speed by splitting that big file among each lane, so it can be read in parallel and saturate PCiE bus.
 

NorthStar

Neo Member
afaik Cerny never claimed greater than linear
I just checked you are right he was speaking about non linear frequency and power consumption.
Edit: I think it came from DF interview with Cerny not sure if that is what Cerny said or if that's what Richard thinks that the PS5 should be more capable than its TFLOPS
“Sony’s pitch is essentially this: a smaller GPU can be a more nimble, more agile GPU, the inference being that PS5’s graphics core should be able to deliver performance higher than you may expect from a TFLOPs number that doesn’t accurately encompass the capabilities of all parts of the GPU,” Digital Foundry editor Richard Leadbetter explains.
 
Last edited:

rnlval

Member
Nope
It only has one decompression hardware block that handles both zlib and bcpack, there are no alternative io paths


Second component they mean in addition to the SSD (2.4GB/s of guaranteed throughput)
6GB/s its just a peak figure the decompression block can handle just like 22GB/s in the PS5s

That 4.8GB/s figure already accounts BCpack higher compression (100%) for textures
Your argument is bullshit. Who are you? Do you have the authority to override MS's Andrew Goossen?

The decompression hardware has two supported compression formats

1. ZLib for general datatypes.
2. BCpack for texture datatypes.

That's two different ASIC IP block for each compression formats

This is not rocket science to detect compression format's headers
KIl1QTS.jpg
 
Last edited:

M-V2

Member
Ah okay, that clears things up. It's not just the fact the SSDs are too slow for data to be worked on in the same way as RAM, but NAND just has its own quirks that prevent that being the case (granularity of read/write operations being too large, cell integrity degradation with prolonged power/erase cycles, etc.).

However the thing with both systems is that the SSDs are intended moreso for direct read access by the GPU, CPU, and other chips. So the speeds are fast enough for things such as certain types of texture streaming, streaming of audio assets, etc. Other neat little things as well.

I guess MS just chose the term "virtual memory" because they thought it would be the easy way for most gamers to comprehend the concept? It's not a particularly accurate description though.



Who is downplaying? The truth is we don't have all the critical info on the SSDs for either system so it's hard to discern what the delta on that front will actually be until we get that information. This is a very sane and rationalized way to look at the situation for the time being.

PS5 SSD will still have the raw advantage, but customizations and optimizations on both ends could either keep the delta the same or have it shrink. It has a probability of happening so it's okay to keep that possibility open. Again, we don't know what specific type of NAND the companies are using (not just in terms of QLC, TLC, MLC etc. but even just the manufacturer part numbers because that could help with finding documentation), we don't know the random access times on first page or block, we don't know the random access figures in general, the latency of the chips, page sizes, block sizes etc.

We don't even know everything about the compression and decompression hardware/software for them yet, or full inner-workings of the flash memory controllers. I don't think questioning these things automatically translates to trying to downplay one system or another. People are allowed to question things like GPU CU cache amounts for the systems (usually in question if XSX has made increases to the cache size to scale with the GPU size and offset compromises with the memory setup and slower GPU clockspeed for example), so questioning the SSD setup in both systems should also be allowed on the table.

Meanwhile you are speculating with some of your own numbers (priority levels for XSX SSD have not been mentioned IIRC), which you're fair to do, but don't feel as if you can throw that type of speculation out there and then get away with insisting people merely speculating on aspects of the SSDs that haven't been divulged yet is them trying to downplay the system with an SSD advantage.

It's not that serious :LOL:
I have been on this thread for a while, and I know who question it & who downplay it. I say what I like based on my observations, you like it or not that's up to you. I'm allowed to say anything as long as it doesn't go against the thread policy. And I know very well what I'm talking about.
 

SonGoku

Member
I estimated 10. Something wrong with my maths. Add 30GB to that.
90% of 825GB is 742.5GB
After 10GB OS that leaves 732.5GB
1. You haven't countered my arguments.

2. How are you? Disprove this.
1. Your response wasn't relevant to i what I said, you named drop a random technique that has nothing to do with interleaved memory. For simultaneous GPU/CPU access using 16bit address you are effectively halving their respective bandwidth 280GB/s (GPU) & 168GB/s (CPU). Physical limitation of each chip

2. I did, you need to work on reading comprehension
First component: SSD
Second component: Decompression block
The decompression hardware supports Zlib for general data and a new compression [system] called BCPack that is tailored to the GPU textures that typically comprise the vast majority of a game's package size."
One decompression unit handles both compression algorithms
 
Last edited:

Bo_Hazem

Banned
Well no... If you can have a full atmos setup at your home is something beautiful, full speakers I mean. But I know people that can't recognize a 128kbps mp3 from a flac, so different solutions for different audiences

Yeah 128Kbps sounds like shit compared to 320Kbps, but higher bitrates like on CD with 1411kbps feels like they're fucking singing next to you! You can hear the slightest soft touches on a guitar string.:messenger_musical:

But it's like trying to convince someone that 8K is much cleaner and sharper than 4K, when he's still arguing that 1080p is enough.🤷‍♂️
 

draliko

Member
Ye
Yeah 128Kbps sounds like shit compared to 320Kbps, but higher bitrates like on CD with 1411kbps feels like they're fucking singing next to you! You can hear the slightest soft touches on a guitar string.:messenger_musical:

But it's like trying to convince someone that 8K is much cleaner and sharper than 4K, when he's still arguing that 1080p is enough.🤷‍♂️
Yep, I had the chance to test some high grade headphones... And they were something sublime (you need a proper amp and dac too). But considering that ppl like how beats sounds... I like how everyone is praising (rightfully? We'll see) sony audio solution and then same person will use TV speakers or beats cans...
 

SonGoku

Member
The decompression hardware has two supported compression formats

1. ZLib for general datatypes.
2. BCpack for texture datatypes.
Correct
That's two different ASIC IP block for each compression formats
Wrong. The same hardware block handles both formats
This is not rocket science to detect compression format's headers
You are the only person i came across with the (incorrect) interpretation that there's two decompressing blocks. DF is clear, only one
 
Status
Not open for further replies.
Top Bottom