Albert Panello ......
…...size was even larger than the PS4 because of the ESRAM, a size which would have allowed them to easily have a +2TF GPU. Building consoles is a gamble made years in advance and MS lost big time with theirs.
IMO it's much simpler, only two cases. There are 5 64-bit controllers used by both the GPU and CPU:
1) Access the first 1GB of each chip (10GB) - accessed by all 5 controllers for 560GB/s for both CPU and GPU.
2) Access the second 1GB of the chips that have a second GB (6GB) - accessed by 3 controllers for 336GB/s for both CPU and GPU.
It works the same in the 360, the X1X, the PS4, the PS4 Pro, the PS5 and the Switch. I see no reason for it to work differently on the XSX (other than the side effect of having two chip types).
…….
Yeah, the 'we would have been better' story for XB1 sounds like fantasy. In the PS4 reveal by Mark Cerny he even explained how they considered doing a XB1 style design far superior in bandwidth to the XB1 - a superior design the XB1 still didn't come close to. It is easy to copy a solution once revealed, 100x harder to do with no-one to imitate. It is comments like this from team xbox that undermine the hard work that the real engineering team of xbox did to make a 1.24TF/1.4TF APU console at the time which was still worlds apart from regular APU product. Panello in that instance just looks like someone that can't even accept being well and truly beaten by an engineer team they could only dream of having access to, and rather than gain respect from others by acknowledging what their competitor achieved to beat them, he would rather sully the achievement with FUD as though it was on the table for them.
In your scenario with the XsX memory controller setup, how and why do they wire 5 Memory Controller Units (MCUs) to the Zen2 for a lackluster 336GB/s of typical CPU access?
The 4C/8T modules used in the Zen2 typically have 1x L3 Cache and 1x64bit MCU per module. Your scenario more than doubles the wire count for the CPU to MCUs, to result in a 10GB unified fast access, and a 6GB unified slow access. That doesn't even factor in the headache of how to fully wire 5 MCUs to a very large CU count GPU. Typically AMD GPUs are even counts for MCUs 256bit or 384bit AFAIK -. 4xMCUs/2x128bit gangs or 6xMCUs/3x128bit gangs. In your scenario it looks like a lot of additional layers for wiring - and more heat with more layers IIRC , but certainly more cost and complexity-with minimal benefit over the setup I suggested for running PC ports at Ultra High settings.
The setup you described will still have normal amounts of memory contention (for each pool) and still have the headache of scheduling those asymmetric accesses that cost more bandwidth for the same time slice. From an engineering standpoint the complexity versus the benefit of unified memory doesn't add up IMHO. They've already set their stall out that all games should be GPU bandwidth top heavy for texturing and shading.,Why would they not stick with that analysis? They can save on complexity, and on BOM , and still get to write 560GB/s on the spec sheet as though it was a spec win.