WOLF Advanced Technology explains the progression of NVIDIA® GPU architecture from Turing to Blackwell, concentrating on high-end embedded GPUs including Turing TU104, Ampere GA104, Ada AD103 and Blackwell GB203. Read more >>
The paper describes how NVIDIA maintained a consistent architectural framework while integrating improvements to CUDA core data paths, Tensor core capabilities, Ray Tracing cores and memory systems. These developments were supported by advances in manufacturing processes that enabled denser chips, higher clock speeds and increased energy efficiency.
WOLF details high-level component blocks across generations, noting upgrades such as the PCIe host interface advancing from Gen 3 in Turing to Gen 5 in Blackwell, and the transition from GDDR6 to GDDR7 memory with higher bandwidth and support for higher-density 3 GB modules. It also highlights increases in maximum memory bandwidth, changes to L2 cache capacity and updates to NVENC and NVDEC blocks.
Architectural refinements within Graphics Processing Clusters (GPCS) included integrating Raster Operation Partitions (ROP) directly into each GPC and increasing the number of Texture Processing Clusters (TPCs) per GPC as architectures evolved.
Key changes to Streaming Multiprocessors are described in detail, including modifications to CUDA datapaths that progressed from fixed FP32 and INT32 lanes in Turing to fully flexible FP32/INT32 allocation in Blackwell.
The paper traces Tensor core evolution from Gen1 through Gen5, documenting added data precision modes such as TF32, BF16, FP8, FP4 and FP6, along with support for sparsity and microscaling. Ray Tracing cores are likewise tracked from their introduction in Turing through successive generations, each delivering significant gains in throughput and rendering efficiency.
WOLF concludes its whitepaper by outlining NVIDIA’s accompanying software ecosystem, including the CUDA Toolkit, CUDA Compute capability information and AI and HPC SDKs. It notes that each new GPU generation is supported with updated tools designed to help developers access new architectural features.
Overall, WOLF emphasizes that each step from Turing to Blackwell brought substantial performance increases and new capabilities relevant to graphics, compute and AI workloads.





