Enter your email address below and subscribe to our newsletter

SpaceX builds AI training system in C for 220,000 GPUs

Share your love

  • SpaceX has nearly finished a custom AI training stack written in C, designed for 220,000 Nvidia 2.95% GB300 accelerators with 800G networking.gagadget
  • Elon Musk said the system will power Grok v5 for xAI, which merged into SpaceX earlier this year and is already training models up to 10 trillion parameters.kingy
  • The claimed tenfold speed advantage over Google’s 1.48% JAX framework remains unverified, with no independent benchmarks published yet.gagadget

SpaceX Builds Custom C-Based AI Training Stack for 220,000-GPU Cluster

SpaceX has nearly completed a custom AI training system written in the C programming language, designed to run on a cluster of 220,000 Nvidia GB300 accelerators connected via 800G networking. Elon Musk announced the project on X on May 28, claiming the system could deliver more than 10 times the speed of Google’s JAX framework for large training runs.gagadget

Bare-Metal Approach

“SpaceX has almost finished writing V1.0 of an in-house AI training stack in C that exact-maps to 220k GB300s with 800G NICs, making heavy use of pipeline parallelism and getting as close to bare metal as possible,” Musk wrote on X.x

The approach represents a departure from industry norms. Most AI labs train models on frameworks like JAX or Meta’s PyTorch, which use Python-based abstraction layers for ease of development. Writing directly in C eliminates those intermediary software layers, allowing the code to interact more directly with hardware — but at the cost of development flexibility.gagadget

Each Nvidia GB300 NVL72 rack houses 72 Blackwell Ultra GPUs and 36 Grace CPUs, with 800 Gb/s networking per GPU. At 220,000 accelerators, the cluster ranks among the largest AI training installations announced to date. The stack relies heavily on pipeline parallelism, a technique for splitting model training across thousands of chips simultaneously, where minimizing communication latency is critical to performance.basenor

Grok and the xAI Integration

Musk confirmed that the new training stack will power Grok v5, xAI’s next major model release. The system’s development comes after xAI merged into SpaceX in February 2026, consolidating the AI model, training hardware, and software under a single entity.kingy

SpaceX’s Colossus 2 cluster is already training seven models in parallel, including variants scaling up to 10 trillion parameters for Grok 5. The custom C stack appears aimed at further accelerating these efforts.basenor

Unverified Claims

The 10x speed claim over JAX remains unverified by independent benchmarks. JAX itself holds multiple MLPerf records, and no third-party testing of SpaceX’s system has been published. The proprietary nature of the stack also raises questions about auditability — unlike open frameworks, a closed system is harder for outside researchers to benchmark or build upon.gagadget

SpaceX is entering a domain where Google, Microsoft, and Meta have spent years optimizing training infrastructure. Whether the bare-metal approach delivers its promised gains at scale remains to be seen.

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay informed and not overwhelmed, subscribe now!