Videos » The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think

The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think

Posted by admin
A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor? Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype. Weights and Biases' Weave: https://wandb.me/ai_explained AI Insiders: https://www.patreon.com/AIExplained Chapters: 00:00 – Introduction 00:57 – Claude 3.5 Sonnet (New) Paper 02:06 – Demo 02:58 – OSWorld 04:29 – Benchmarks compared + OpenAI Response 08:30 – Tau-Bench 13:09 – SimpleBench Results 17:05 – Yellowstone Detour 17:29 – Runway Act-One 18:44 – HeyGen Interactive Avatars + Demo 21:06 – NotebookLM Update New Claude: https://www.anthropic.com/news/3-5-models-and-computer-use https://www.anthropic.com/research/developing-computer-use Paper: https://assets.anthropic.com/m/1cd9d098ac3e6467/original/Claude-3-Model-Card-October-Addendum.pdf Demo Diversion: https://x.com/AnthropicAI/status/1848742761278611504 https://www.youtube.com/watch?v=jqx18KgIzAE o1 Comparison: https://openai.com/index/learning-to-reason-with-llms/ https://www.swebench.com/ Tau Bench: https://arxiv.org/pdf/2406.12045 OSWorld: https://arxiv.org/pdf/2404.07972 GSM Reasoning: https://arxiv.org/pdf/2410.05229 Sierra Valuation: https://www.theinformation.com/articles/bret-taylors-ai-agent-startup-nears-deal-that-could-value-it-at-over-4-billion?rc=sy0ihq Claude Impressions: https://x.com/skirano/status/1848750867245133982 o1 System Card: https://assets.ctfassets.net/kftzwdyauwt9/67qJD51Aur3eIc96iOfeOP/71551c3d223cd97e591aa89567306912/o1_system_card.pdf NotebookLM: https://notebooklm.google/ Runway Act-One: https://runwayml.com/research/introducing-act-one HeyGen Zoom: https://labs.heygen.com/interactive-avatar/vicky Ministral Comparison: https://x.com/armandjoulin/status/1846581336909230255 My Coursera Course - The 8 Most Controversial Terms in AI: https://imp.i384100.net/m57g3M Non-hype Newsletter: https://signaltonoise.beehiiv.com/ I use Descript to edit my videos (no pauses or filler words!): https://get.descript.com/ldgxfuj2bhnb Many people expense AI Insiders for work. Feel free to use the Template in the 'About Section' of my Patreon. https://www.patreon.com/AIExplained
Posted Oct 23
click to rate

Embed  |  261 views