Map Reserved Resource to Upload Heap Directx 12

Indifferent to the divergence?

Back to the onetime times, you never worried about how the physical retention would be allocated when you lot're dealing with OpenGL and DirectX 11, GPU and the video memory were hidden behind the commuter so well that you might even not realize they were in that location. Present we go Vulkan and DirectX 12 (of course Metallic, just…nevermind), that the "naught driver-overhead" slogan (not "Make XXX Cracking Once again" sadly) become the reality on desktop platforms. And "ohh I don't know that nosotros need to manually handle the synchronization" or "ohh why I can't straight bind that resources" and then on and on. Of course, the new generation (already not new actually) graphic API is non for casual usages, your easily become dirtier and your caput gets more than drizzling, while at that place are yet a bunch of debugging layer alarm and error letters keeping pop up. Long story in short, if you want something pretty and simple, turn around and rush to modern OpenGL (four.3+) or DirectX 11 and happy coding; if you desire something pretty and fast, then stay with me a while and let'southward see what'southward going on with the new D3D12 memory model.

The fundamental CPU-GPU communication architecture is quite similar around different machines, y'all have a CPU chip, you lot take a GPU flake, you have some retention chips, gotcha! The typical PC with a dedicated graphics bill of fare would accept 2 memory chips, one nosotros often referred as the main retentivity and some other one as the dedicated graphics card memory, or more commonly used (non and so strict) proper noun convention are RAM and VRAM for them. Other architectures like those game consoles, the main retentiveness, and the video menu memory would be the same physical i, we name such kind of retentivity accessing model equally UMA - Uniform Memory Access. Also, the functional microchips of CPU and GPU would exist put together or closer in some sure designs (for instance PS4) to become optimized communication performance. You should recall that you but paid once for your DDR4 16GB fancy "memory" when you're crafting your land-of-the-fine art PC right? They are the "master" RAM for the full general-purpose, like loading your Os subsequently ability-up or put the elements of your std::vector<T> within. But if y'all likewise purchased an AMD or NVIDIA dedicated graphics menu, you might detect the printed instructions on the package box that there are other couple-few Gibibytes some sort of retention on it. That's the VRAM memory where the raw texture and mesh information would stay when y'all are playing CS:Get and swearing random Ruglish in front of your screen.

An over-simplified PC

So, if you want to return the overnice SWAT soldier mesh in CS::GO, you need to load it from your disk into the VRAM and then ask GPU to schedule some parallel rasterization work to draw it. Simply unfortunately, y'all can't access the VRAM straight in your C++ code due to the physical design of the hardware. Yous could reference a chief memory virtual address by semantics like Foo* bar = nullptr in C++, because it would exist finally compiled into some machine instructions like movq, $(0x108), $0 (it should be binary instruction data actually, for the sake of human-readability here I use assembly linguistic communication instead) that your CPU could execute. But by and large speaking, y'all can't look the same programming feel on GPU, since information technology is designed for highly parallel computational tasks thus you lot tin can't refer to some fine-grin global retentiveness addresses straight (in that location are e'er some exceptions, just let's stay foolish at present). The start offset of a bunch of raw VRAM information should be available for you in order to create a context for GPU to prepare and execute works. If yous were familiar with OpenGL or D3D11 then yous had already used interfaces such every bit glBindTextures or ID3D11DeviceContext::PSSetShaderResources. These 2 APIs betrayal the VRAM retention not explicitly to developers, instead, you would get some indirect objects in runtimes like an integer handle in OpenGL or a COM object pointer in D3D11.

A step closer

GPU is a heavily history-influenced peripheral product, as time goes by its power becomes more and more general and flexible. As you lot might know the appearing of Unified Scalar Shader Compages and Highly Data Parallel Stream Processing made GPU become compatible for almost every kinds of parallel computation works, the only thing lying between the programmer and the hardware is the API. The quondam generation of graphics APIs like OpenGL or DirectX 11 were designed with emphasis, that they'd better lead developers to a direction that they'd spend more than time with the specific computer graphics related tasks they want to hand on with, rather than besides much depression-level hardware related details. Just the experience told us, more abstraction, more overhead. So when the clock ticking around 2015 that the latest generation of graphics API was released to the mass programmer like me, a make new or I'd rather to say "retro" design philosophy actualization among them, no more pre-defined "texture" or "mesh" or "constant buffer" object models, instead we get some new only lower-level objects such as "resources" or "buffers" or "control".

Honestly speaking, it's a little bit painful to transit the programming mindset from OpenGL/D3D11 era to Vulkan/D3D12. Information technology's quite like a 3-years-old kid who used to ride his cute tiny bicycle with auxiliary wheels at present need to drive a half-dozen-shifts manual gear 4WD motorcar. Previously you lot phone call a glGen* or ID3D11Device::Create* interfaces you would get the resource handles in no means more than few milliseconds. At present you even tin can't "invoke" functions to let GPU do these works! Simply wait, could we actually ask GPU to allocate a VRAM range for the states and put some nice AK-47 textures within earlier? Just the graphic cards vendor's implementation handled the underlying dingy concern for us, all the synchronization of CPU-GPU communication, all the VRAM allotment and management, all the resources binding details, we had even not taken a glimpse about them before! But it's not as bad as I exaggerated, you just take to take care the additional steps which you lot don't obligate to do previously, and if y'all succeeded you'd not only get more code in your repo but too a tremendous functioning boost in your applications.

Allow's forget about the API problems for a couple few minutes and take a expect back at the hardware architecture to better sympathise the concrete structure that triggered the API "revolution". The actual memory direction relies on the hardware memory charabanc (information technology's part of the I/O bridge) and the MMU - Memory Thousandanagement Unit, they piece of work together to transfer data between the processor and different external adapters to RAM, and mapping physical retentivity address to a virtual one. So when you want to load some information from your HDD to RAM, the information would travel through the I/O bridge to CPU and and so after some parsing processes information technology would be stored into RAM. If you lot had a performance-focused attitude when writing codes, yous may wonder is there any optimizations for usage cases like simply loading an executable binary file to RAM, which doesn't require whatsoever additional processing to the information itself. And yes, we had DMA - Direct Thouemory Access! With DMA the data doesn't need to travel through CPU anymore and instead, it would be loaded directly from the HDD to RAM.

A closer look at the Processor-Memory communication model

As we could imagine, CPU and GPU could have individual RAMs and MMUs and Memory Buses, thus they could execute and load-store data into their RAMs individually. That'south perfect, ii kingdoms live peacefully with each other. But the problems emerge as soon as they get-go to communicate, the data needs to be transferred from the CPU side to GPU side or vice versa, and we demand to build a "highway" for it. I of the "highway" hardware communication protocol that widely used today is PCI-E, I'd omit the detail instructions and focus on what we'd care most here. Information technology'due south basically another bus-like design and provides the functionality that we could transfer data in betwixt different adapters, such as a dedicated graphics card and chief memory. With its help, we could almost freely (sadly highway yet need payment, it'south not a freeway yet) write something utilizing CPU and GPU together now.

Too many bridges (omit MMU and Memory Bus for simplification)!

The bridges are a piddling bit too many, isn't it? If you remembered that I've briefly introduced a memory architecture chosen UMA earlier, it basically just looks like we merging RAM and VRAM together. Since its design requires the chip and memory manufacturers to produce such products, and until now I've never seen one in the customer hardware marketplace, we can't craft information technology by ourselves. Only still, if you had an Xbox One or PS4 you've enjoyed the benefit of UMA.

UMA, Umami

Heap creation

So at present information technology'due south fourth dimension to open up your favorite IDE and #include some headers. In D3D12, all the resources would resident inside some explicitly specified memory pools, and the responsibility to manage the retention puddle belongs to the developer at present. This is the how the interface

            HRESULT              ID3D12Device              ::              CreateHeap              (              const              D3D12_HEAP_DESC              *pDesc,              REFIID                riid,              void              *              *ppvHeap              )              ;

comes. If you're familiar with D3D11 or other Windows APIs in COM model you lot could hands understand the function signature style. It is made by the combination of a reference to a clarification construction instance, a COM object grade'south GUID and a pointer to shop the created object instance's address. The return value of the office is the execution result.

Now let's take a look at the description structure:

                          typedef              struct              D3D12_HEAP_DESC              {              UINT64                SizeInBytes;              D3D12_HEAP_PROPERTIES Backdrop;              UINT64                Alignment;              D3D12_HEAP_FLAGS      Flags;              }              D3D12_HEAP_DESC;

Information technology manifestly follows the consequent code fashion of D3D12 API, and here we go some other holding structure to fulfill in:

                          typedef              struct              D3D12_HEAP_PROPERTIES              {              D3D12_HEAP_TYPE         Type;              D3D12_CPU_PAGE_PROPERTY CPUPageProperty;              D3D12_MEMORY_POOL       MemoryPoolPreference;              UINT                    CreationNodeMask;              UINT                    VisibleNodeMask;              }              D3D12_HEAP_PROPERTIES;

This construction would inform the device which kind of the concrete memory should the heap refer to. Since the documentation of D3D12 is comprehensible enough, I'd rather not talk near too many things which have been listed there. When D3D12_HEAP_TYPE Type is non D3D12_HEAP_TYPE_CUSTOM, and then the D3D12_CPU_PAGE_PROPERTY CPUPageProperty should exist e'er D3D12_CPU_PAGE_PROPERTY_UNKNOWN, because the CPU accessibility of the heap has already been indicated by the D3D12_HEAP_TYPE so you lot shouldn't repeat the information; Similar reason, D3D12_MEMORY_POOL MemoryPoolPreference should always be D3D12_MEMORY_POOL_UNKNOWN when D3D12_HEAP_TYPE Type is non D3D12_HEAP_TYPE_CUSTOM.

In UMA architecture, there is only one physical memory pool which is both shared by CPU and GPU, the nearly mutual example is that you got an Xbox I and get-go to write some D3D12 games on information technology. In such example simply D3D12_MEMORY_POOL_L0 is bachelor and thus we don't need to take care of it at all.

The most of the desktop PC with a dedicated graphics card are NUMA retention architecture (although recent years there are something like AMD's hUMA appeared and gone), in such case D3D12_MEMORY_POOL_L0 is the RAM and D3D12_MEMORY_POOL_L1 is the VRAM.

So now if nosotros prepare the heap type to D3D12_HEAP_TYPE_CUSTOM, and then nosotros could have a more flexible control over the heap configuration. I'll list a nautical chart below that how different combination of D3D12_CPU_PAGE_PROPERTY and D3D12_MEMORY_POOL would finally expect like on NUMA architectures.

	NOT_AVAILABLE	WRITE_COMBINE	WRITE_BACK
L0	Like as `D3D12_HEAP_TYPE_DEFAULT`, a GPU access-only RAM (but a niggling bit non-sense configuration for common usage cases)	Similar as `D3D12_HEAP_TYPE_UPLOAD`, it is uncached for CPU read operation and then the reading issue won't always stay coherent but write operation is faster because now the memory ordering is trivial and irrelevant, perfect for GPU to read	Similar as `D3D12_HEAP_TYPE_READBACK`, all the GPU write operation would exist cached and CPU read performance would become a coherent and consistent effect
L1	Similar every bit `D3D12_HEAP_TYPE_DEFAULT`, a GPU access-just VRAM	Invalid, CPU can't access VRAM straight	Invalid, CPU can't access VRAM directly

It looks like that nosotros don't need a custom heap property structure on NUMA architectures (or single engine/single adapter case), all possible heap types have been already provided by the pre-divers types, there is non too much space for us to maneuver in order to go some advanced optimization. Only if your application wants any better customization for all the possible hardware that it would run on, then using custom heap backdrop is still worth enough to investigate.

The Processor-Memory model in D3D12

And finally, we had a misc flag mask to betoken the detailed usage of the heap:

                          typedef              enum              D3D12_HEAP_FLAGS              {              D3D12_HEAP_FLAG_NONE,              D3D12_HEAP_FLAG_SHARED,              D3D12_HEAP_FLAG_DENY_BUFFERS,              D3D12_HEAP_FLAG_ALLOW_DISPLAY,              D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER,              D3D12_HEAP_FLAG_DENY_RT_DS_TEXTURES,              D3D12_HEAP_FLAG_DENY_NON_RT_DS_TEXTURES,              D3D12_HEAP_FLAG_HARDWARE_PROTECTED,              D3D12_HEAP_FLAG_ALLOW_WRITE_WATCH,              D3D12_HEAP_FLAG_ALLOW_SHADER_ATOMICS,              D3D12_HEAP_FLAG_ALLOW_ALL_BUFFERS_AND_TEXTURES,              D3D12_HEAP_FLAG_ALLOW_ONLY_BUFFERS,              D3D12_HEAP_FLAG_ALLOW_ONLY_NON_RT_DS_TEXTURES,              D3D12_HEAP_FLAG_ALLOW_ONLY_RT_DS_TEXTURES              }              ;

Depends on the specific D3D12_RESOURCE_HEAP_TIER that unlike hardware back up, some certain D3D12_HEAP_FLAGS are not allowed to use alone or combine together. The furthermore particular is well documented on the official website then I'll not discuss them here. Because some of the enums are simply the alias to the others, the actual possible heap flags are less than how many it is defined, and I'll list a chart beneath to demonstrate dissimilar usage cases and the corresponding flags.

	Tier1	Tier2
All resource types	Not supported	`D3D12_HEAP_FLAG_ALLOW_ALL_BUFFERS_AND_TEXTURES` or `D3D12_HEAP_FLAG_NONE`
Buffer simply	`D3D12_HEAP_FLAG_DENY_RT_DS_TEXTURES \| D3D12_HEAP_FLAG_DENY_NON_RT_DS_TEXTURES`	`D3D12_HEAP_FLAG_ALLOW_ONLY_BUFFERS` (allonym of Tier1)
Non-RT/DS texture merely	`D3D12_HEAP_FLAG_DENY_BUFFERS \| D3D12_HEAP_FLAG_DENY_RT_DS_TEXTURES`	`D3D12_HEAP_FLAG_ALLOW_ONLY_NON_RT_DS_TEXTURES` (alias of Tier1)
RT/DS texture only	`D3D12_HEAP_FLAG_DENY_BUFFERS \| D3D12_HEAP_FLAG_DENY_NON_RT_DS_TEXTURES`	`D3D12_HEAP_FLAG_ALLOW_ONLY_RT_DS_TEXTURES` (alias of Tier1)
Swap-chain surface just	`D3D12_HEAP_FLAG_ALLOW_DISPLAY`	Same as Tier1
Shared heap (multi-process)	`D3D12_HEAP_FLAG_SHARED`	Same as Tier1
Shared heap (multi-adapter)	`D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER`	Same every bit Tier1
Retention write tracking	`D3D12_HEAP_FLAG_ALLOW_WRITE_WATCH`	Same equally Tier1
Diminutive primitive	`D3D12_HEAP_FLAG_ALLOW_SHADER_ATOMICS`	Aforementioned as Tier1

Equally y'all can see higher up, the merely meaningful difference between Tier1 and Tier2 here is that Tier2 support a D3D12_HEAP_FLAG_ALLOW_ALL_BUFFERS_AND_TEXTURES flag thus we could put all the mutual resources into i heap. Information technology again depends on what specific job you would similar to finish, sometimes you want an all-in-i heap, sometimes information technology's improve to separate them into unlike heaps past the usage cases.

Resource creation

Later on you created a heap successfully, you could outset to create resources inside information technology now. In that location are 3 different ways to create a resources:

Create resource which has simply virtual accost inside the already created heap, it requires us to map to the physical address manually after. ID3D12Device::CreateReservedResource is the interface for such a task;
Create resource which has both virtual address and mapped physical address within the already created heap, the most commonly-used resources are this type. ID3D12Device::CreatePlacedResource is the interface for such a task;
Create placed-resource and an implicate heap at the same time. ID3D12Device::CreateCommittedResource is the interface for such a chore.

If you don't desire to manually manage the heap retention at all, so you could choose to utilize committed-resource with some sacrifices to the performance, merely naturally information technology's not a skillful thought to stick with committed-resource heavily in the product lawmaking (unless you lot're lazy like me who don't want to write more code in testify-case projects). The more mature choice is using placed-resources since we've already could create heaps, the only thing left that you accept to do now is designing a heap memory direction module with some efficient strategies. Y'all could just use every bit many design patterns and architectures from the feel when yous're implementing the main RAM heap memory management system (nonetheless malloc() within 16ms? No way!). A ring buffer or a double buffer for Upload heap or some linked-list for Default heap or whatever, at that place are no limitations for the imagination, but assay your awarding requirement and figure out a suitable solution (but don't write a messy GC organisation for it:). There shouldn't exist too many choices since in the most D3D12 applications like a game, the well-nigh of the resources are CPU write-once and others are dynamic buffers which won't occupy too much infinite but update frequently.

The more advanced situation which rely on a tremendous memory size, such like mega-texture (maybe you need a 64x64 $km^2$ terrain albedo texture?) or sparse-tree volume textures (maybe you need a voxel-cone-traced irradiance volume?), which would alphabetize over the physical VRAM accost easily or the actual texture size is beyond the maximum hardware support. In such cases a dynamic virtual retentivity accost mapping technique is necessary. Developers intended to implement a software cache solution for this problem in the past because the APIs didn't provide any reliable functionalities at that fourth dimension (before D3D11.ii and OpenGL 4.iv which started to support tiled/sparse textures). The reserved-resources in D3D12 are the fresh new one-for-all solution today, it inherited the design of the tiled-resource compages in D3D11 merely besides provided more flexibilities. Just still, it depends on the hardware support when you lot wonder how to fit your elegant and circuitous SVOGI volume texture into the VRAM, it's better to query D3D12_TILED_RESOURCES_TIER and see if the target hardware support tiled-resource or not at showtime.

stoutwhadrect.blogspot.com

Source: https://zhangdoa.com/posts/walking-through-the-heap-properties-in-directx-12/

Map Reserved Resource to Upload Heap Directx 12

Indifferent to the divergence?

A step closer

Heap creation

Resource creation

0 Response to "Map Reserved Resource to Upload Heap Directx 12"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel