The need for 800G – Invest Alpha

The need for 800G

AI training & inference = massive east-west traffic

Traditional data centers were north-south (users ↔ servers).
AI clusters are east-west (GPU ↔ GPU ↔ GPU).

Training a large model requires constant gradient synchronization
Thousands of GPUs exchange data every few microseconds
Network bandwidth, not compute, becomes the bottleneck

👉 800G doubles bandwidth per link, cutting congestion without doubling fibers.

2️⃣ GPUs are scaling faster than networks

Look at the mismatch:

Generation	GPU compute growth	Network link growth
Pre-AI	~2× every ~2 yrs	100G → 200G
AI era	3–5× per gen	400G → 800G

NVIDIA’s latest systems assume:

Higher bandwidth per GPU
Lower latency
Fewer hops

👉 400G fabrics start choking before GPUs are fully utilized.

3️⃣ Fewer cables, ports, and power = real money

Data centers don’t just pay for optics — they pay for everything attached to them.

800G vs 400G lets operators:

Cut port count in half
Reduce switch radix
Lower fiber density
Save power per transported bit

Example:

2 × 400G links → 1 × 800G
Same throughput, less heat, less space, less failure points

👉 Hyperscalers care about $ / bit / watt, not just raw speed.

4️⃣ AI racks are getting physically bigger

Modern AI racks:

8–72 GPUs per rack
NVLink / copper inside the rack
Optics between racks

As racks scale:

Copper stops working beyond very short distances
Optics take over sooner
Higher speed per optical lane becomes mandatory

👉 800G is the sweet spot before 1.6T matures.

5️⃣ Switch silicon forced the transition

The optics follow the switches, not the other way around.

51.2T switches (Tomahawk 5 class)
64 ports × 800G = full utilization
Using 400G would waste switch capacity

👉 Once switch silicon goes 51.2T+, 800G optics are inevitable.

6️⃣ Cost curves finally crossed

Earlier, 800G was:

Too hot
Too expensive
Too complex

Now:

DSP efficiency improved
Optical engines more integrated
Manufacturing yields rising