All what i mentioned plus BcPack and then this:
In light of this, I think it's relevant that I post what I posted in another thread....
Regarding sampler feedback streaming... I'm not sure people get what it actually does... So I'm going to try and explain things step by step...
First, the transfer value given for the I/O slash SSD is basically a bandwidth value. The 2.4 GB/s raw value of the XSX means that at most, 2.4 GB of data can be transferred per second.
The compressed value does not magically increase the 2.4 GB/s. What it does is, compress the files to make them smaller. The max amount transferred is still going to be 2.4GB in a second. But when you decompress it again on the 'other side', the equivalent size of the data would have been 4.8GB if you could have transferred it as raw data. So effectively, it's 4.8GB/s, but in practice, 2.4GB/s is being transferred.
Then we get to SFS. First, take a look at what MS themselves say on it;
Sampler Feedback Streaming (SFS) – A component of the Xbox Velocity Architecture, SFS is a feature of the Xbox Series X hardware that allows games to load into memory, with fine granularity, only the portions of textures that the GPU needs for a scene, as it needs it. This enables far better memory utilization for textures, which is important given that every 4K texture consumes 8MB of memory. Because it avoids the wastage of loading into memory the portions of textures that are never needed, it is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance.
[Editor’s Note: Updated on 10/21 at 11AM to ensure it is now reflective of the capabilities across both of our next-gen Xbox consoles following the unveil of Xbox Series S.] As we enter a new generation of console gaming with Xbox Series X and Xbox Series S, we’ve made a number of technology...
news.xbox.com
That last sentence is important. It is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance. Now what does that mean? If you want to stream part of textures, you will inevitably need to have tiling. What is tiling? You basically divide the whole texture in equally sized tiles. Instead of having to load the entire texture, which is large, you load only the tiles that you need from that texture. You then don't have to spend time discarding so many parts of the texture that you don't need after you spent resources loading it. It basically increases transfer efficiency. Tiled resources is a hardware feature that is present since the first GCN, but there are different tiers to it, the latest one being Tier 4, which no current market GPU supports. It is possible that the XSX is the first one to have this, but don't quote me on that. It might simply be Tier 3 still.
In any case. When tiling, the size of the tiles will determine how efficient you can be. The smaller the tiles, the more accurate you can be for loading, and the less bandwidth you will need. Theoretically, you can be bit-precise so to speak, but that's unrealistic and requires an unrealistic amount of processing power. There is an optimum there, but we don't have enough information to determine where that point is in the XSX. Apparently 64KB is typical. Microsoft is claiming that with SFS the effective mulitplier can be more than 3x. This means that, after compression (everything on the SSD will inevitably be compressed), you can achieve a higher than 3x 4.8GB/s in effective streaming. To put it another way, effectively, the XSX is capable of transferring 14.4 GB/s of data from the SSD. This does not mean that 14.4GB/s is actually being transferred. Just like with compression, the amount of transferred data is still max 2.4GB/s. What it does mean is that if you compare loading the 2.4GB/s of compressed tiled data to loading the full raw uncompressed data, you would need more than 14.4GB/s bandwidth to transfer the same amount of data, i.e. to ultimately achieve the same result. This also helps RAM use obviously, because you're loading everything from the SSD into RAM, and you would be occupying RAM space that you wouldn't have. Basically, it decreases the load on everything, including the already mentioned RAM, the CPU and GPU.
I don't see the 3x reduction in bandwidth usage by SFS as impossible. The thing is, it is not guaranteed, because if you're up close to an object and that object is all you see, you will not be able to avoid loading the highest quality of the textures nor the full texture, which means SFS will basically give zero advantage in such a case, since there is nothing to 'discard', or in better terms avoid loading to RAM. But for far away objects that have extremely detailed textures, SFS will likely reduce the required bandwidth by quite a lot. The PS5 will have its 8-9GB/s at all times, while the benefit of SFS is sort of situational, although calling it situational kind of downplays its capability a bit, since in the majority of cases/games, you won't be hugging walls constantly.
And here I'm going to speculate for a little bit, in comparison to the PS5. Tiled resources has been a feature in GPUs for a while. And the main part that allows this is sampler feedback. Now, you can have sampler feedback, but that does not mean that you necessarily have sampler feedback
streaming. That would depend on the I/O. I recall Cerny mentioning that the GPU is custom built, and that they choose which features they wish to include and not include on the GPU. That implies they did not include everything that AMD has to offer in the GPUs. Most likely neither did MS. But if the PS5 still has this feature, then things regarding the SSDs remain proportionally the same between the compressed values in terms of performance difference, 9GB/s vs 4.8 GB/s. However, considering the beefy I/O of the PS5, it is actually quite possible that Sony ditched the tiled resources feature, and instead opted to beef up the I/O to allow the streaming of the full textures instead. If this is the case, then really, the difference in the SSD performance between the two consoles will be quite minimal. Why they would do that is beyond me though, so, most likely it's still in there. Whether they can stream it immediately is another story.
Just to confirm...;
So conclusion is, the PS5 can most likely do sampler feedback streaming, but would need CPU resources for it, while the XSX does it in hardware. And some more info;
"The general process of loading texture data on-demand, rather than upfront-all-at-once, is called
texture streaming in this document, or
streaming for short. It makes sense to use streaming in scenarios where only some of a texture’s mips are deemed necessary for the scene. When new mips are deemed necessary– for example, if a new object appears in the scene, or if an object has moved closer into view and requires more detail– the application may choose to load more-detailed parts of the mip chain.
There is a kind of Direct3D resource particularly suitable for providing control to applications under memory-constrained scenarios:
tiled resources. To avoid the need to keep all most-detailed mips of a scene’s textures in memory at the same time, applications may use tiled resources. Tiled resources offer a way to keep parts of a texture resident in memory while other parts of the texture are not resident.
To adopt SFS, an application does the following:
- Use a tiled texture (instead of a non-tiled texture), called a reserved texture resource in D3D12, for anything that needs to be streamed.
- Along with each tiled texture, create a small “MinMip map” texture and small “feedback map” texture.
- The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
- The feedback map represents and per-region desired mip level for the tiled texture; it represents what needs to be loaded.
- Update the mip streaming engine to stream individual tiles instead of mips, using the feedback map contents to drive streaming decisions.
- When tiles are made resident or nonresident by the streaming system, the corresponding texture’s MinMip map must be updated to reflect the updated tile residency, which will clamp the GPU’s accesses to that region of the texture.
- Change shader code to read from MinMip maps and write to feedback maps. Feedback maps are written using special-purpose HLSL constructs.
Engineering specs for DirectX features.
microsoft.github.io