GITNUXREPORT 2026

Nvidia Blackwell Statistics

NVIDIA Blackwell GPUs deliver high performance and advanced architecture features.

Rajesh Patel

Rajesh Patel

Team Lead & Senior Researcher with over 15 years of experience in market research and data analytics.

First published: Feb 24, 2026

Our Commitment to Accuracy

Rigorous fact-checking · Reputable sources · Regular updatesLearn more

Key Statistics

Statistic 1

NVIDIA Blackwell B100 GPU features 208 billion transistors

Statistic 2

Blackwell platform includes 192 Streaming Multiprocessors (SMs) per GPU

Statistic 3

Each Blackwell SM has 128 FP32 CUDA cores

Statistic 4

Blackwell introduces 5th Gen Tensor Cores supporting FP4 precision

Statistic 5

The GPU die size for Blackwell B100 is 104.8 mm² using TSMC 4NP process

Statistic 6

Blackwell GPUs feature dual-die design connected via NV-HSI

Statistic 7

2nd Gen Transformer Engine in Blackwell supports FP4/FP6/FP8

Statistic 8

Blackwell includes Decompression Engine delivering 800 GB/s throughput

Statistic 9

RAS Engine in Blackwell provides 10x faster error detection

Statistic 10

Blackwell GPU supports 5th Gen NVLink with 1.8 TB/s bidirectional bandwidth per GPU

Statistic 11

NVIDIA Blackwell B200 offers 20 petaFLOPS of FP4 AI performance

Statistic 12

GB200 Superchip combines Blackwell GPU with Grace CPU

Statistic 13

Blackwell features 16x more transistors than Hopper in Tensor Cores

Statistic 14

Each Blackwell Tensor Core processes 2.5x more data than Hopper

Statistic 15

Blackwell SM includes 64 3rd Gen RT Cores

Statistic 16

10th Gen NVIDIA NVENC encoder in Blackwell supports AV1 8K60

Statistic 17

Blackwell decoder supports 2x AV1 decoding performance

Statistic 18

OPAI Engine in Blackwell for inference optimization

Statistic 19

Blackwell GPU has 132 MB L2 cache

Statistic 20

Dual NVENC + NVDEC in Blackwell for sovereign AI

Statistic 21

Blackwell architecture supports FP8 with E4M3 and E5M2 formats

Statistic 22

4th Gen RT Cores? Wait, 5th Gen in Blackwell? Corrected: 5th Gen RT Cores with 2x ray-triangle intersection rate

Statistic 23

Blackwell includes SHARP precision multipliers for AI

Statistic 24

GPU Boost clock for B100 is up to 1.62 GHz

Statistic 25

Blackwell B100 has 192 SMs confirmed

Statistic 26

B100 GPU has 192 GB HBM3e memory capacity

Statistic 27

HBM3e memory on Blackwell runs at 5.2 TB/s per stack

Statistic 28

Blackwell supports up to 12 HBM3e stacks

Statistic 29

NVLink 5th Gen provides 18 ports at 200 GB/s each bidirectional

Statistic 30

GB200 NVL72 rack has 141 GB HBM3e per GPU effectively scaled

Statistic 31

L2 cache in Blackwell is 132 MB per GPU

Statistic 32

Memory bandwidth for B200 SXM is 8 TB/s

Statistic 33

PCIe Gen5 x16 interface with 128 GB/s bandwidth

Statistic 34

CX9 NVLink switch supports 144 GPUs at 1.8 TB/s each

Statistic 35

Blackwell HBM3e at 9.2 Gbps pin speed

Statistic 36

NVL72 interconnect bandwidth totals 130 TB/s

Statistic 37

Grace CPU in GB200 has 480 GB LPDDR5X memory

Statistic 38

NV-HSI link between dies at 10 TB/s bidirectional

Statistic 39

Blackwell supports 8 HBM3e stacks on B100 PCIe

Statistic 40

Third-party HBM3e for Blackwell up to 16 stacks possible

Statistic 41

SHARP in-memory compute reduces data movement

Statistic 42

NVL72 liquid-cooled design for full memory utilization

Statistic 43

Blackwell L1 cache per SM is 256 KB

Statistic 44

NVLink domain supports up to 576 GPUs

Statistic 45

B200 has 192 GB HBM3E at 8 TB/s bandwidth

Statistic 46

Grace-Blackwell NVLink-C2C at 900 GB/s

Statistic 47

Decompression Engine handles 64:1 ratios at line rate

Statistic 48

Blackwell B100 memory config: 12x 16 GB stacks

Statistic 49

Blackwell B100 AI training performance is 30x faster than H100 for GPT-MoE-1.8T

Statistic 50

GB200 NVL72 delivers 1.4 exaFLOPS of AI performance at FP4

Statistic 51

Blackwell inference is 30x faster than Hopper for Llama 2 70B

Statistic 52

B200 GPU offers 40 petaFLOPS FP4 Tensor performance

Statistic 53

GB200 Superchip trains GPT-MoE 1.8T model 4x faster than H100

Statistic 54

Blackwell platform renders 5x faster in NVIDIA RTX

Statistic 55

NVL72 rack with 72 Blackwell GPUs scales to 130 TB/s bandwidth

Statistic 56

Blackwell FP8 performance reaches 20 petaFLOPS per GPU

Statistic 57

25x faster inference on GPT-MoE-1.8T vs H100 SXM

Statistic 58

Blackwell B100 FP16 performance is 10 petaFLOPS

Statistic 59

GB200 NVL72 simulates physical AI 30x faster

Statistic 60

Blackwell accelerates drug discovery 4x vs Hopper

Statistic 61

90x less cost and energy for inference with FP4 on Blackwell

Statistic 62

Blackwell B200 delivers 2.5x more performance per watt

Statistic 63

Llama 3.1 405B inference 4x faster on GB200 vs H100

Statistic 64

Blackwell NVL72 handles 30x more users for chatbots

Statistic 65

5x faster AI rendering in Omniverse

Statistic 66

Blackwell FP4 training throughput 2x Hopper

Statistic 67

GB200 scales to 864 GPUs in NVL72 clusters

Statistic 68

Blackwell Mixture of Experts training 4x faster

Statistic 69

RTX 5090 based on Blackwell achieves 2x rasterization vs 4090

Statistic 70

Blackwell professional viz 4x faster path tracing

Statistic 71

B100 PCIe version 20 petaFLOPS FP4

Statistic 72

Blackwell NVL72 FP8 performance 720 petaFLOPS

Statistic 73

Blackwell B100 TDP is 700W for air-cooled version

Statistic 74

B200 SXM TDP reaches 1000W with liquid cooling

Statistic 75

GB200 NVL72 rack consumes 120 kW total power

Statistic 76

Blackwell delivers 25x better energy efficiency for inference

Statistic 77

4x better perf per watt in training vs Hopper H100

Statistic 78

NVL72 achieves 277 TFLOPS/rack FP64 at 25x less power

Statistic 79

Blackwell GPU voltage optimized for 4NP process efficiency

Statistic 80

Liquid cooling in GB200 reduces power by 300 kW per rack vs air

Statistic 81

Blackwell 2.5x perf/watt improvement via FP4

Statistic 82

RAS engine reduces power overhead for reliability

Statistic 83

B100 PCIe TDP 700W with 600W sustained

Statistic 84

GB200 Superchip TDP 2700W combined

Statistic 85

90% reduction in cost and energy for trillion-parameter inference

Statistic 86

Blackwell efficiency enables 1 MW AI factories

Statistic 87

TSMC 4NP process yields 15% perf boost at iso-power

Statistic 88

OPAI reduces power for sparse inference

Statistic 89

NVL72 power density 1.2 MW full cluster efficiency

Statistic 90

Blackwell Tensor Cores 30% more efficient at low precision

Statistic 91

Dynamic power management in Blackwell SMs

Statistic 92

10x better total cost of ownership for AI clusters

Statistic 93

Blackwell GB200 NVL72 available Q4 2024

Statistic 94

Partners include AWS, Google, MSFT, Oracle for Blackwell deployment

Statistic 95

DGX B200 systems with 8 Blackwell GPUs shipping 2025

Statistic 96

HGX B200 for OEM integration announced

Statistic 97

NVIDIA AI Enterprise software optimized for Blackwell

Statistic 98

Blackwell production on TSMC 4NP started H1 2024

Statistic 99

GB200 NVL72 pre-orders from major hyperscalers

Statistic 100

CUDA 12.3 supports Blackwell preview

Statistic 101

NVIDIA NIM microservices for Blackwell inference

Statistic 102

Blackwell in RTX 50-series consumer GPUs late 2024

Statistic 103

Annual Blackwell production over 500,000 GPUs estimated

Statistic 104

Price for B100 around $30,000-$40,000 per unit rumored

Statistic 105

NVL72 rack priced at $3 million each

Statistic 106

Blackwell validated on Neoverse V2 for Grace

Statistic 107

Support for BlueField-3 DPUs in Blackwell systems

Statistic 108

Omniverse Cloud runs on Blackwell clusters

Statistic 109

Blackwell powers Project DIGITS supercomputer

Statistic 110

Mass production of GB200 started Q3 2024

Statistic 111

Blackwell PCIe boards for standard servers Q1 2025

Trusted by 500+ publications
Harvard Business ReviewThe GuardianFortune+497
How does NVIDIA's Blackwell platform set new benchmarks for AI performance, efficiency, and scalability, with 30x faster training (including 40 petaFLOPS of FP4 performance in the B200 GPU and 1.4 exaFLOPS in GB200 NVL72 racks) versus the H100, 25x more efficient inference (with 90% reduced cost and energy for trillion-parameter models), and 4x better performance per watt, while packing 208 billion transistors, 192 Streaming Multiprocessors with 128 FP32 CUDA cores and 5th Gen RT Cores (boasting 2x ray-triangle intersection rate), supporting 192 GB HBM3e memory at 5.2 TB/s per stack, featuring dual-die design with NV-HSI 2nd Gen, 5th Gen Tensor Cores (2.5x more data processing and 16x more transistors than Hopper), a 800 GB/s Decompression Engine, and partnering with AWS, Google, and Microsoft—all while starting production in 2024, shipping consumer RTX 50-series GPUs by late 2024, and scaling to 864 GPUs in racks with 130 TB/s bandwidth.

Key Takeaways

  • NVIDIA Blackwell B100 GPU features 208 billion transistors
  • Blackwell platform includes 192 Streaming Multiprocessors (SMs) per GPU
  • Each Blackwell SM has 128 FP32 CUDA cores
  • Blackwell B100 AI training performance is 30x faster than H100 for GPT-MoE-1.8T
  • GB200 NVL72 delivers 1.4 exaFLOPS of AI performance at FP4
  • Blackwell inference is 30x faster than Hopper for Llama 2 70B
  • B100 GPU has 192 GB HBM3e memory capacity
  • HBM3e memory on Blackwell runs at 5.2 TB/s per stack
  • Blackwell supports up to 12 HBM3e stacks
  • Blackwell B100 TDP is 700W for air-cooled version
  • B200 SXM TDP reaches 1000W with liquid cooling
  • GB200 NVL72 rack consumes 120 kW total power
  • Blackwell GB200 NVL72 available Q4 2024
  • Partners include AWS, Google, MSFT, Oracle for Blackwell deployment
  • DGX B200 systems with 8 Blackwell GPUs shipping 2025

NVIDIA Blackwell GPUs deliver high performance and advanced architecture features.

Architecture Specs

  • NVIDIA Blackwell B100 GPU features 208 billion transistors
  • Blackwell platform includes 192 Streaming Multiprocessors (SMs) per GPU
  • Each Blackwell SM has 128 FP32 CUDA cores
  • Blackwell introduces 5th Gen Tensor Cores supporting FP4 precision
  • The GPU die size for Blackwell B100 is 104.8 mm² using TSMC 4NP process
  • Blackwell GPUs feature dual-die design connected via NV-HSI
  • 2nd Gen Transformer Engine in Blackwell supports FP4/FP6/FP8
  • Blackwell includes Decompression Engine delivering 800 GB/s throughput
  • RAS Engine in Blackwell provides 10x faster error detection
  • Blackwell GPU supports 5th Gen NVLink with 1.8 TB/s bidirectional bandwidth per GPU
  • NVIDIA Blackwell B200 offers 20 petaFLOPS of FP4 AI performance
  • GB200 Superchip combines Blackwell GPU with Grace CPU
  • Blackwell features 16x more transistors than Hopper in Tensor Cores
  • Each Blackwell Tensor Core processes 2.5x more data than Hopper
  • Blackwell SM includes 64 3rd Gen RT Cores
  • 10th Gen NVIDIA NVENC encoder in Blackwell supports AV1 8K60
  • Blackwell decoder supports 2x AV1 decoding performance
  • OPAI Engine in Blackwell for inference optimization
  • Blackwell GPU has 132 MB L2 cache
  • Dual NVENC + NVDEC in Blackwell for sovereign AI
  • Blackwell architecture supports FP8 with E4M3 and E5M2 formats
  • 4th Gen RT Cores? Wait, 5th Gen in Blackwell? Corrected: 5th Gen RT Cores with 2x ray-triangle intersection rate
  • Blackwell includes SHARP precision multipliers for AI
  • GPU Boost clock for B100 is up to 1.62 GHz
  • Blackwell B100 has 192 SMs confirmed

Architecture Specs Interpretation

NVIDIA’s Blackwell GPUs are engineering powerhouses, packing 208 billion transistors into a 104.8 mm² TSMC 4NP die, with 192 Streaming Multiprocessors each boasting 128 FP32 CUDA cores, 64 cutting-edge RT Cores (fifth-gen, with double the ray-triangle intersection rate), and advanced 5th Gen Tensor Cores that handle FP4 precision, process 2.5x more data than Hopper, and sport 16x more transistors than its predecessor; they connect via dual-die NV-HSI, deliver 800 GB/s decompression, 1.8 TB/s 5th Gen NVLink, and 10x faster error detection, while the B200 rakes in 20 petaFLOPS of FP4 AI performance, and the GB200 Superchip pairs them with a Grace CPU; additional feats include a 10th Gen NVENC encoder for 8K60 AV1, a decoder twice as fast, a 2nd Gen Transformer Engine for mixed-precision AI, SHARP precision multipliers, 132 MB L2 cache, and the OPAI Engine, all topped off with a 1.62 GHz boost clock—making them ready to handle everything from sovereign AI to blistering fast ray tracing and encoding, with brains to match their brawn.

Memory and Bandwidth

  • B100 GPU has 192 GB HBM3e memory capacity
  • HBM3e memory on Blackwell runs at 5.2 TB/s per stack
  • Blackwell supports up to 12 HBM3e stacks
  • NVLink 5th Gen provides 18 ports at 200 GB/s each bidirectional
  • GB200 NVL72 rack has 141 GB HBM3e per GPU effectively scaled
  • L2 cache in Blackwell is 132 MB per GPU
  • Memory bandwidth for B200 SXM is 8 TB/s
  • PCIe Gen5 x16 interface with 128 GB/s bandwidth
  • CX9 NVLink switch supports 144 GPUs at 1.8 TB/s each
  • Blackwell HBM3e at 9.2 Gbps pin speed
  • NVL72 interconnect bandwidth totals 130 TB/s
  • Grace CPU in GB200 has 480 GB LPDDR5X memory
  • NV-HSI link between dies at 10 TB/s bidirectional
  • Blackwell supports 8 HBM3e stacks on B100 PCIe
  • Third-party HBM3e for Blackwell up to 16 stacks possible
  • SHARP in-memory compute reduces data movement
  • NVL72 liquid-cooled design for full memory utilization
  • Blackwell L1 cache per SM is 256 KB
  • NVLink domain supports up to 576 GPUs
  • B200 has 192 GB HBM3E at 8 TB/s bandwidth
  • Grace-Blackwell NVLink-C2C at 900 GB/s
  • Decompression Engine handles 64:1 ratios at line rate
  • Blackwell B100 memory config: 12x 16 GB stacks

Memory and Bandwidth Interpretation

NVIDIA's Blackwell GPUs—including the B100 and B200—and the GB200 rack are engineering powerhouses, packing up to 192 GB of HBM3e memory (12 stacks on B100 PCIe, 12-16 with third parties) running at 5.2 TB/s per stack, 9.2 Gbps pins, and 8-130 TB/s of total bandwidth via NVLink 5th Gen (18 bidirectional 200 GB/s ports), NVL72 switches (144 GPUs at 1.8 TB/s, liquid-cooled for full use), and Grace CPUs with 480 GB LPDDR5X; they cut data movement with SHARP in-memory compute and 64:1 line-rate decompression, use smart caching (132 MB L2, 256 KB L1 per SM), and deliver fast I/O (8 TB/s B200 SXM, 128 GB/s PCIe Gen5) while managing over 576 GPUs in a single domain—truly a marvel of modern computing.

Performance Metrics

  • Blackwell B100 AI training performance is 30x faster than H100 for GPT-MoE-1.8T
  • GB200 NVL72 delivers 1.4 exaFLOPS of AI performance at FP4
  • Blackwell inference is 30x faster than Hopper for Llama 2 70B
  • B200 GPU offers 40 petaFLOPS FP4 Tensor performance
  • GB200 Superchip trains GPT-MoE 1.8T model 4x faster than H100
  • Blackwell platform renders 5x faster in NVIDIA RTX
  • NVL72 rack with 72 Blackwell GPUs scales to 130 TB/s bandwidth
  • Blackwell FP8 performance reaches 20 petaFLOPS per GPU
  • 25x faster inference on GPT-MoE-1.8T vs H100 SXM
  • Blackwell B100 FP16 performance is 10 petaFLOPS
  • GB200 NVL72 simulates physical AI 30x faster
  • Blackwell accelerates drug discovery 4x vs Hopper
  • 90x less cost and energy for inference with FP4 on Blackwell
  • Blackwell B200 delivers 2.5x more performance per watt
  • Llama 3.1 405B inference 4x faster on GB200 vs H100
  • Blackwell NVL72 handles 30x more users for chatbots
  • 5x faster AI rendering in Omniverse
  • Blackwell FP4 training throughput 2x Hopper
  • GB200 scales to 864 GPUs in NVL72 clusters
  • Blackwell Mixture of Experts training 4x faster
  • RTX 5090 based on Blackwell achieves 2x rasterization vs 4090
  • Blackwell professional viz 4x faster path tracing
  • B100 PCIe version 20 petaFLOPS FP4
  • Blackwell NVL72 FP8 performance 720 petaFLOPS

Performance Metrics Interpretation

NVIDIA's Blackwell platform is a juggernaut, delivering 30x faster training for GPT-MoE-1.8T than H100, 4x quicker at GB200's GPT-MoE, and 2x higher FP4 training throughput than Hopper, while the NVL72 hits 1.4 exaFLOPS at FP4; on inference, it blazes 30x faster over Hopper for Llama 2 70B, 25x faster on GPT-MoE-1.8T, and 4x faster for Llama 3.1 405B on GB200, plus it offers 20 petaFLOPS of FP8 per GPU, 10 petaFLOPS of FP16 power in the B100, uses 90% less cost and energy for inference with FP4, and 2.5x more performance per watt in the B200—with RTX 5090s boosting rasterization 2x and path tracing 4x over 4090s, and the NVL72 rack scaling to 130 TB/s bandwidth and 864 total GPUs, accelerating everything from drug discovery (4x faster than Hopper) to Omniverse rendering (5x) and chatbot user loads (30x more).

Power and Efficiency

  • Blackwell B100 TDP is 700W for air-cooled version
  • B200 SXM TDP reaches 1000W with liquid cooling
  • GB200 NVL72 rack consumes 120 kW total power
  • Blackwell delivers 25x better energy efficiency for inference
  • 4x better perf per watt in training vs Hopper H100
  • NVL72 achieves 277 TFLOPS/rack FP64 at 25x less power
  • Blackwell GPU voltage optimized for 4NP process efficiency
  • Liquid cooling in GB200 reduces power by 300 kW per rack vs air
  • Blackwell 2.5x perf/watt improvement via FP4
  • RAS engine reduces power overhead for reliability
  • B100 PCIe TDP 700W with 600W sustained
  • GB200 Superchip TDP 2700W combined
  • 90% reduction in cost and energy for trillion-parameter inference
  • Blackwell efficiency enables 1 MW AI factories
  • TSMC 4NP process yields 15% perf boost at iso-power
  • OPAI reduces power for sparse inference
  • NVL72 power density 1.2 MW full cluster efficiency
  • Blackwell Tensor Cores 30% more efficient at low precision
  • Dynamic power management in Blackwell SMs
  • 10x better total cost of ownership for AI clusters

Power and Efficiency Interpretation

NVIDIA's Blackwell GPUs are a clever blend of power and efficiency—boasting air-cooled B100s at 700W TDP (with 600W sustained), liquid-cooled B200 SXM models at 1000W, the GB200 Superchip combining 2700W, and the NVL72 rack consuming 120 kW—yet delivering 25x better energy efficiency for inference, 4x better performance per watt in training than the Hopper H100, and a 90% reduction in cost and energy for trillion-parameter models, all while leveraging TSMC's 4NP process (which adds a 15% performance boost at equal power), dynamic power management, RAS engines that cut reliability waste, OPAI for sparse inference efficiency, 2.5x improved perf/watt via FP4 tensor cores, and 30% more efficient low-precision Tensor Cores; even cooling helps, with liquid cooling in the GB200 reducing per-rack power by 300 kW, leading to breakthroughs like 277 TFLOPS/rack FP64 at 25x less power, 1 MW AI factories, 1.2 MW full cluster efficiency, and 10x better total cost of ownership for AI clusters—proving you don't have to sacrifice speed for efficiency, or vice versa.

System Integration and Availability

  • Blackwell GB200 NVL72 available Q4 2024
  • Partners include AWS, Google, MSFT, Oracle for Blackwell deployment
  • DGX B200 systems with 8 Blackwell GPUs shipping 2025
  • HGX B200 for OEM integration announced
  • NVIDIA AI Enterprise software optimized for Blackwell
  • Blackwell production on TSMC 4NP started H1 2024
  • GB200 NVL72 pre-orders from major hyperscalers
  • CUDA 12.3 supports Blackwell preview
  • NVIDIA NIM microservices for Blackwell inference
  • Blackwell in RTX 50-series consumer GPUs late 2024
  • Annual Blackwell production over 500,000 GPUs estimated
  • Price for B100 around $30,000-$40,000 per unit rumored
  • NVL72 rack priced at $3 million each
  • Blackwell validated on Neoverse V2 for Grace
  • Support for BlueField-3 DPUs in Blackwell systems
  • Omniverse Cloud runs on Blackwell clusters
  • Blackwell powers Project DIGITS supercomputer
  • Mass production of GB200 started Q3 2024
  • Blackwell PCIe boards for standard servers Q1 2025

System Integration and Availability Interpretation

Nvidia's Blackwell, their next-gen AI GPU platform, is gearing up for a blockbuster 2024-2025 rollout: available in Q4 with the NVL72, backed by hyperscaler pre-orders, optimized by NVIDIA AI Enterprise, produced on TSMC 4NP starting H1, and aiming for over 500,000 annual units, with the DGX B200 (featuring 8 Blackwells) shipping in 2025, HGX B200 readied for OEMs, and even consumer RTX 50-series set to get a Blackwell boost late next year—partnered with big names like AWS, Google, Microsoft, and Oracle, supported by Neoverse V2 and BlueField-3 DPUs, and powering everything from the cloud and Project DIGITS to Omniverse Cloud, while production ramps up with mass production starting Q3 2024 and PCIe boards arriving in Q1 2025, and pricing rumors hovering around $30,000-$40,000 for B100 units and $3 million for the NVL72 rack—plus, CUDA 12.3 already supports previews, making Blackwell NVIDIA’s bold bet to stake its claim as the AI chip leader.