AI training & inference = massive east-west traffic

Traditional data centers were north-south (users ↔ servers).
AI clusters are east-west (GPU ↔ GPU ↔ GPU).

  • Training a large model requires constant gradient synchronization
  • Thousands of GPUs exchange data every few microseconds
  • Network bandwidth, not compute, becomes the bottleneck

👉 800G doubles bandwidth per link, cutting congestion without doubling fibers.


2️⃣ GPUs are scaling faster than networks

Look at the mismatch:

GenerationGPU compute growthNetwork link growth
Pre-AI~2× every ~2 yrs100G → 200G
AI era3–5× per gen400G → 800G

NVIDIA’s latest systems assume:

  • Higher bandwidth per GPU
  • Lower latency
  • Fewer hops

👉 400G fabrics start choking before GPUs are fully utilized.


3️⃣ Fewer cables, ports, and power = real money

Data centers don’t just pay for optics — they pay for everything attached to them.

800G vs 400G lets operators:

  • Cut port count in half
  • Reduce switch radix
  • Lower fiber density
  • Save power per transported bit

Example:

  • 2 × 400G links → 1 × 800G
  • Same throughput, less heat, less space, less failure points

👉 Hyperscalers care about $ / bit / watt, not just raw speed.


4️⃣ AI racks are getting physically bigger

Modern AI racks:

  • 8–72 GPUs per rack
  • NVLink / copper inside the rack
  • Optics between racks

As racks scale:

  • Copper stops working beyond very short distances
  • Optics take over sooner
  • Higher speed per optical lane becomes mandatory

👉 800G is the sweet spot before 1.6T matures.


5️⃣ Switch silicon forced the transition

The optics follow the switches, not the other way around.

  • 51.2T switches (Tomahawk 5 class)
  • 64 ports × 800G = full utilization
  • Using 400G would waste switch capacity

👉 Once switch silicon goes 51.2T+, 800G optics are inevitable.


6️⃣ Cost curves finally crossed

Earlier, 800G was:

  • Too hot
  • Too expensive
  • Too complex

Now:

  • DSP efficiency improved
  • Optical engines more integrated
  • Manufacturing yields rising