I am trying to understand the new revelations on how ESRAM bandwidth has been calculated, and the explanation for what they were able to measure.
Please correct me if I am wrong.
1. What was described as an accidental discovery of simultaneous read/write capability in the ESRAM was perhaps just new to the developer who leaked the Microsoft blog. The engineers are not THAT incompetent.
2. It seems the ESRAM has 4 main modules, each with a 256-bit interface (27GB/s). They call it 1024bits (109GB/s) as the 4 modules can be accessed in parallel. The catch is that the data must be simultaneously present in all 4 regions in order to achieve 109GB/s (or 218GB/s simultaneous read/write). Hence, "of course if you're hitting the same area over and over and over again, you don't get to spread out your bandwidth and so that's one of the reasons why in real testing you get 140-150GB/s rather than the peak 204GB/s". In a theoretical moment where the data was in only one of the 8mb modules, the bandwidth would be as low as 27GB/s (54GB/s simultaneous read/write).
In contrast, the whole PS4 GDDR5 memory is interfaced at 256 bits at 5.5ghz. The rate (256 ÷ 8 x 5500 = 176000MB/s) is not dependent on access patterns. Hence, I presume, the practical bandwidth being closer to the theoretical maximum.
3. From Wiki (I know) it appears SRAM with separate I/O buses (as opposed to a common I/O bus) exists that can read and write at the rising and falling edges of a clock signal respectively, using a single port. The catch is that the read and write operations are pipelined and sequential (a read must be followed by a write in the pipeline) to the same memory space. This explains why "if you're only doing a read you're capped at 109GB/s, if you're only doing a write you're capped at 109GB/s". I presume this suits blending operations well, which Leadbetter alluded to in his earlier article. It also suggests that Microsoft fully expects developers to use the ESRAM for framebuffer operations.
Further suggestions welcomed.