Videos » What's actually inside a $100 billion AI data center?

What's actually inside a $100 billion AI data center?

Posted by admin
OpenAI and Microsoft are apparently planning to build a $100 billion data center codenamed Stargate. We discuss how this compares to existing data centers and other planned investments into AI centric infrastructure. They don't seem to have enough time to design special purpose networking and other hardware, but it is a massive investment compared to other plans. We discuss the design problems you have to solve when creating a data center. You need to power it, you need to make sure it stays cool, provide networking, and provide resilience and redundancy for everything. We also discuss how Google data centers are a little different from the norm. Finally, we discuss the AI chips that could be present in a data center. Primarily, this means Nvidia gpus or Google TPUs. There are implications for the software stack and ultimate usability of the system. Why Does OpenAI Need a 'Stargate' Supercomputer? Ft. Perplexity CEO Aravind Srinivas https://www.youtube.com/watch?v=KXG2f-So9oo Making AI accessible with Andrej Karpathy and Stephanie Zhan https://www.youtube.com/watch?v=c3b-JASoPi0 Google AI Infrastructure Supremacy: Systems Matter More Than Microarchitecture https://www.semianalysis.com/p/google-ai-infrastructure-supremacy Microsoft & OpenAI consider $100bn, 5GW 'Stargate' AI data center - report https://www.datacenterdynamics.com/en/news/microsoft-openai-consider-100bn-5gw-stargate-ai-data-center-report/ ELI5 : How's it that just 400 cables under the ocean provides all the internet to entire world and who actually owns and manages these cables https://www.reddit.com/r/explainlikeimfive/comments/1390m3h/eli5_hows_it_that_just_400_cables_under_the_ocean/ Newmark: US data center power consumption to double by 2030 https://www.datacenterdynamics.com/en/news/us-data-center-power-consumption/ Data Centres and Data Transmission Networks https://www.iea.org/energy-system/buildings/data-centres-and-data-transmission-networks Understanding Data Center Costs and How they Compare to the Cloud https://granulate.io/blog/understanding-data-center-costs-and-how-they-compare-to-the-cloud/ Cost estimate to build and run a data center with 100k AI accelerators - and plenty questions https://www.reddit.com/r/datacenter/comments/1b5nv1v/cost_estimate_to_build_and_run_a_data_center_with/ Amazon Bets $150 Billion on Data Centers Required for AI Boom https://www.bloomberg.com/news/articles/2024-03-28/amazon-bets-150-billion-on-data-centers-required-for-ai-boom The world’s top data centre investors https://www.fdiintelligence.com/content/data-trends/the-worlds-top-data-centre-investors-82669 #ai #datacenter #openai 0:00 Intro 0:26 Contents 0:33 Part 1: Data center gold rush 0:46 Server racks and data centers 1:28 What about spending $1 billion? 1:47 50,000 AI accelerators 2:12 $7 trillion previous plan 2:34 OpenAI asks for $10 billion 2:50 OpenAI asks for $100 billion 3:13 Codename Stargate, from science fiction show 3:37 100,000 GPU limit 4:03 Amazon invests $148 billion 4:29 Google has significant data center investment 5:10 Do we actually need this much compute? 5:57 Part 2: So you want to build a datacenter 6:46 Design challenge 1: power consumption 7:11 Proportion of global power consumption 8:09 Collapse of carbon credit market 8:41 Data centers use prepurchased renewable electricity 9:27 Design challenge 2: Cooling 9:44 Power for cooling exceeds power for servers 10:06 Google runs hotter data centers 10:40 Design challenge 3: Networking 11:21 Fiber optic cables 12:26 Intra-rack and inter-gpu networks 13:20 Design challenge 4: Resilience and redundancy 14:36 Don't rely on a single data center 15:22 Stargate has a single region design 15:49 Part 3: The hardware secret sauce 16:25 Hardware stacks 16:28 Nvidia has a monopoly due to CUDA 16:59 Nvidia charges very high prices 17:23 Consumer-grade GPU clusters are cheaper 17:51 Google has TPUs as a GPU alternative 18:29 TPU microarchitecture 19:02 TPU network is a 3D torus 19:59 Software stacks 20:13 PyTorch and Tensorflow 20:52 Computational graph representation 21:25 Python reflection to create graph 21:59 XLA compiler from Google 22:26 Leverages LLVM compiler technology 23:04 Example: LLVM also used in web browsers 23:22 Will Stargate use their own chips, network? 24:19 AI chips use a lot more power than usual 24:45 Implications of building Stargate 25:12 Conclusion 26:00 AI-specific hardware suppliers 26:53 Outro
Posted May 22
click to rate

Embed  |  65 views