The Memphis supercluster is now assumed to be the most important supercluster for constructing AI fashions.
Again in Might 2024, xAI introduced that they’d raised USD 6 billion. The primary traders have been the same old suspects — Marc Andreessen, Ben Horowitz, Sequoia Capital, and Al Waleed bin Talal Al Saud. xAI had stated then that they have been going to make use of the cash to develop superior infrastructure able to coaching big AI fashions. It appears like they’ve delivered on that promise.
So, they now have a supercluster with 100,000 liquid-cooled NVIDIA H100 Tensor Core GPUs. That’s lots of compute! And fairly a feat too. NVIDIA’s NVLink change system means that you can interconnect 256 H100 GPUs. It’s weird that they managed to attach 100,000.
Elon Musk has stated that they used a expertise known as RDMA ((Distant Direct Reminiscence Entry)) community cloth. And it’s only a single RDMA cloth. RDMA provides you excessive bandwidth and low latency by permitting community playing cards of a number of laptop programs to straight ship knowledge to one another’s reminiscence with none additional steps. In different phrases, it bypasses the CPU and the working system. It’s nonetheless, in fact, not as quick as native RAM, nevertheless it’s defintely quicker than anything we had earlier.
Elon Musk has additionally stated that he’s going to have the “world’s strongest” AI mannequin by…