Episodios

  • DeepSeek: 2 Months Out
    Apr 9 2025
    DeepSeek has been out for over 2 months now, and things have begun to settle down. We take this opportunity to contextualize the developments that have occurred in its wake, both within the AI industry and the world economy. As systems get more "agentic" and users are willing to spend increasing amounts of time waiting for their outputs, the value of supposed "reasoning" models continues to be peddled by AI system developers, but does the data really back these claims?Check out our DeepSeek minisode for a snappier overview!EPISODE RECORDED 2025.03.30(00:40) - DeepSeek R1 recap (02:46) - What makes it new? (08:53) - What is reasoning? (14:51) - Limitations of reasoning models (why we hate reasoning) (31:16) - Claims about R1 training on Open AI (37:30) - “Deep Research” (49:13) - Developments and drama in the AI industry (56:26) - Proposed economic value (01:14:20) - US government involvement (01:23:28) - OpenAI uses MCP (01:28:15) - OutroLinksDeepSeek websiteDeepSeek paperDeepSeek docs - Models and PricingDeepSeek repo - 3FSUnderstanding DeepSeek/DeepResearchExplainersLanguage Models & Co. article - The Illustrated DeepSeek-R1Towards Data Science article - DeepSeek-V3 Explained 1: Multi-head Latent AttentionJina.ai article - A Practical Guide to Implementing DeepSearch/DeepResearchHan, Not Solo blogpost - The Differences between Deep Research, Deep Research, and Deep ResearchAnalysis and ResearchPreprint - Understanding R1-Zero-Like Training: A Critical PerspectiveBlogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot StudyPreprint - Large Language Monkeys: Scaling Inference Compute with Repeated SamplingPreprint - Chain-of-Thought Reasoning In The Wild Is Not Always FaithfulFallout coverageTechCrunch article - OpenAI calls DeepSeek 'state-controlled,' calls for bans on 'PRC-produced' modelsThe Verge article - OpenAI has evidence that its models helped train China’s DeepSeekInteresting Engineer article - $6M myth: DeepSeek’s true AI cost is 216x higher at $1.3B, research revealsArs Technica article - Microsoft now hosts AI model accused of copying OpenAI dataThe Signal article - Nvidia loses nearly $600 billion in DeepSeek crashYahoo Finance article - The 'Magnificent 7' stocks are having their worst quarter in more than 2 yearsReuters article - Microsoft pulls back from more data center leases in US and Europe, analysts sayUS governanceNational Law Review article - Three States Ban DeepSeek Use on State Devices and NetworksCNN article - US lawmakers want to ban DeepSeek from government devicesHouse bill - No DeepSeek on Government Devices ActSenate bill - Decoupling America's Artificial Intelligence Capabilities from China Act of 2025LeaderboardsaiderLiveBenchLM ArenaKonwinski PrizePreprint - SWE-Bench+: Enhanced Coding Benchmark for LLMsCybernews article - OpenAI study proves LLMs still behind human engineers in over 1400 real-world tasksOther ReferencesAnthropic report - The Anthropic Economic IndexMETR Report - Measuring AI Ability to Complete Long TasksThe Information article - OpenAI Discusses Building Its First Data Center for StorageDeepmind report backing up this ideaTechCrunch article - OpenAI adopts rival Anthropic's standard for connecting AI models to dataReuters article - OpenAI, Meta in talks with Reliance for AI partnerships, The Information reports2024 AI Index reportNDTV article - Ghibli-Style Images To Memes: White House Embraces Alt-Right Online CultureElk post on DOGE and AI
    Más Menos
    1 h y 32 m
  • DeepSeek Minisode
    Feb 10 2025

    DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.

    • (00:00) - Recording date
    • (00:04) - Intro
    • (00:37) - DeepSeek drop and reactions
    • (04:27) - Export controls
    • (08:05) - Skepticism and uncertainty
    • (14:12) - Outro


    Links
    • DeepSeek website
    • DeepSeek paper
    • Reuters article - What is DeepSeek and why is it disrupting the AI sector?

    Fallout coverage

    • The Verge article - OpenAI has evidence that its models helped train China’s DeepSeek
    • The Signal article - Nvidia loses nearly $600 billion in DeepSeek crash
    • CNN article - US lawmakers want to ban DeepSeek from government devices
    • Fortune article - Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price
    • Dario Amodei's blogpost - On DeepSeek and Export Controls
    • SemiAnalysis article - DeepSeek Debates
    • Ars Technica article - Microsoft now hosts AI model accused of copying OpenAI data
    • Wiz Blogpost - Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History

    Investigations into "reasoning"

    • Blogpost - There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study
    • Preprint - s1: Simple test-time scaling
    • Preprint - LIMO: Less is More for Reasoning
    • Blogpost - Reasoning Reflections
    • Preprint - Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH
    Más Menos
    15 m
  • Understanding AI World Models w/ Chris Canal
    Jan 27 2025
    Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!(00:00) - Recording date (00:05) - Intro (00:29) - Hot off the press (02:17) - Introducing Chris Canal (19:12) - World/risk models (35:21) - Competencies + decision making power (42:09) - Breaking models down (01:05:06) - Timelines, test time compute (01:19:17) - Moving goalposts (01:26:34) - Risk management pre-AGI (01:46:32) - Happy endings (01:55:50) - Causal chains (02:04:49) - Appetite for democracy (02:20:06) - Tech-frame based fallacies (02:39:56) - Bringing back real capitalism (02:45:23) - Orthogonality Thesis (03:04:31) - Why we do this (03:15:36) - Equistamp!LinksEquiStampChris's TwitterMETR Paper - RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human expertsAll Trades article - Learning from History: Preventing AGI Existential Risks through Policy by Chris CanalBetter Systems article - The Omega Protocol: Another Manhattan ProjectSuperintelligence & CommentaryWikipedia article - Superintelligence: Paths, Dangers, Strategies by Nick BostromReflective Altruism article - Against the singularity hypothesis (Part 5: Bostrom on the singularity)Into AI Safety Interview - Scaling Democracy w/ Dr. Igor KrawczukReferenced SourcesBook - Man-made Catastrophes and Risk Information Concealment: Case Studies of Major Disasters and Human FallibilityArtificial Intelligence Paper - Reward is EnoughWikipedia article - Capital and Ideology by Thomas PikettyWikipedia article - PantheonLeCun on AGI"Won't Happen" - Time article - Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk"But if it does, it'll be my research agenda latent state models, which I happen to research" - Meta Platforms Blogpost - I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AIOther SourcesStanford CS Senior Project - Timing Attacks on Prompt Caching in Language Model APIsTechCrunch article - AI researcher François Chollet founds a new AI lab focused on AGIWhite House Fact Sheet - Ensuring U.S. Security and Economic Strength in the Age of Artificial IntelligenceNew York Post article - Bay Area lawyer drops Meta as client over CEO Mark Zuckerberg’s ‘toxic masculinity and Neo-Nazi madness’OpenEdition Academic Review of Thomas PikettyNeural Processing Letters Paper - A Survey of Encoding Techniques for Signal Processing in Spiking Neural NetworksBFI Working Paper - Do Financial Concerns Make Workers Less Productive?No Mercy/No Malice article - How to Survive the Next Four Years by Scott Galloway
    Más Menos
    3 h y 20 m
  • NeurIPS 2024 Wrapped 🌯
    Dec 30 2024
    What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.Posters available at time of episode preparation can be found on the episode webpage.EPISODE RECORDED 2024.12.22(00:00) - Recording date (00:05) - Intro (00:44) - Obligatory mentions (01:54) - SoLaR panel (18:43) - Test of Time (24:17) - And now: science! (28:53) - Downsides of benchmarks (41:39) - Improving the science of ML (53:07) - Performativity (57:33) - NopenAI and Nanthropic (01:09:35) - Fun/interesting papers (01:13:12) - Initial takes on o3 (01:18:12) - WorkArena (01:25:00) - OutroLinksNote: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases. NeurIPS statement on inclusivityCTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs(1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionVisual Autoregressive Model report this link now provides a 404 errorDon't worry, here it is on archive.isReuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report saysCTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI GeniusReddit post on Ilya's talkSoLaR workshop pageReferenced SourcesHarvard Data Science Review article - Data Science at the SingularityPaper - Reward Reports for Reinforcement LearningPaper - It's Not What Machines Can Learn, It's What We Cannot TeachPaper - NeurIPS Reproducibility ProgramPaper - A Metric Learning Reality CheckImproving Datasets, Benchmarks, and MeasurementsTutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best PracticesPaper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Paper - A Systematic Review of NeurIPS Dataset Management PracticesPaper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks TrackPaper - Benchmark Repositories for Better BenchmarkingPaper - Croissant: A Metadata Format for ML-Ready DatasetsPaper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites ParadoxPaper - Evaluating Generative AI Systems is a Social Science Measurement ChallengePaper - Report Cards: Qualitative Evaluation of LLMsGovernance RelatedPaper - Towards Data Governance of Frontier AI ModelsPaper - Ways Forward for Global AI Benefit SharingPaper - How do we warn downstream model providers of upstream risks?Unified Model Records toolPaper - Policy Dreamer: Diverse Public Policy Creation via Elicitation and Simulation of Human PreferencesPaper - Monitoring Human Dependence on AI Systems with Reliance DrillsPaper - On the Ethical Considerations of Generative AgentsPaper - GPAI Evaluation Standards Taskforce: Towards Effective AI GovernancePaper - Levels of Autonomy: Liability in the age of AI AgentsCertified Bangers + Useful ToolsPaper - Model Collapse Demystified: The Case of RegressionPaper - Preference Learning Algorithms Do Not Learn Preference RankingsLLM Dataset Inference paper + repodattri paper + repoDeTikZify paper + repoFun Benchmarks/DatasetsPaloma paper + datasetRedPajama paper + datasetAssemblage webpageWikiDBs webpageWhodunitBench repoApeBench paper + repoWorkArena++ paperOther SourcesPaper - The Mirage of Artificial Intelligence Terms of Use Restrictions
    Más Menos
    1 h y 27 m
  • OpenAI's o1 System Card, Literally Migraine Inducing
    Dec 23 2024
    The idea of model cards, which was introduced as a measure to increase transparency and understanding of LLMs, has been perverted into the marketing gimmick characterized by OpenAI's o1 system card. To demonstrate the adversarial stance we believe is necessary to draw meaning from these press-releases-in-disguise, we conduct a close read of the system card. Be warned, there's a lot of muck in this one.Note: All figures/tables discussed in the podcast can be found on the podcast website at https://kairos.fm/muckraikers/e009/(00:00) - Recorded 2024.12.08 (00:54) - Actual intro (03:00) - System cards vs. academic papers (05:36) - Starting off sus (08:28) - o1.continued (12:23) - Rant #1: figure 1 (18:27) - A diamond in the rough (19:41) - Hiding copyright violations (21:29) - Rant #2: Jacob on "hallucinations" (25:55) - More ranting and "hallucination" rate comparison (31:54) - Fairness, bias, and bad science comms (35:41) - System, dev, and user prompt jailbreaking (39:28) - Chain-of-thought and Rao-Blackwellization (44:43) - "Red-teaming" (49:00) - Apollo's bit (51:28) - METR's bit (59:51) - Pass@??? (01:04:45) - SWE Verified (01:05:44) - Appendix bias metrics (01:10:17) - The muck and the meaningLinkso1 system cardOpenAI press release collection - 12 Days of OpenAIAdditional o1 CoverageNIST + AISI [report] - US AISI and UK AISI Joint Pre-Deployment TestApollo Research's paper - Frontier Models are Capable of In-context SchemingVentureBeat article - OpenAI launches full o1 model with image uploads and analysis, debuts ChatGPT ProThe Atlantic article - The GPT Era Is Already EndingOn Data Labelers60 Minutes article + video - Labelers training AI say they're overworked, underpaid and exploited by big American tech companiesReflections article - The hidden health dangers of data labeling in AI developmentPrivacy International article = Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasetsChain-of-Thought Papers CitedPaper - Measuring Faithfulness in Chain-of-Thought ReasoningPaper - Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingPaper - On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language ModelsPaper - Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language ModelsOther Mentioned/Relevant SourcesAndy Jones blogpost - Rao-BlackwellizationPaper - Training on the Test Task Confounds Evaluation and EmergencePaper - Best-of-N JailbreakingResearch landing page - SWE BenchCode Competition - Konwinski PrizeLakera game = GandalfKate Crawford's Atlas of AIBlueDot Impact's course - Intro to Transformative AIUnrelated DevelopmentsCruz's letter to Merrick GarlandAWS News Blog article - Introducing Amazon Nova foundation models: Frontier intelligence and industry leading price performanceBleepingComputer article - Ultralytics AI model hijacked to infect thousands with cryptominerThe Register article - Microsoft teases Copilot Vision, the AI sidekick that judges your tabsFox Business article - OpenAI CEO Sam Altman looking forward to working with Trump admin, says US must build best AI infrastructure
    Más Menos
    1 h y 17 m
  • How to Safely Handle Your AGI
    Dec 2 2024
    While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.(00:00) - Intro (00:29) - Hot off the press (02:59) - Repealing the AI executive order? (11:16) - "Manhattan" for AI (24:33) - EU (30:47) - UK (39:27) - Bengio (44:39) - Comparing EU/UK to USA (45:23) - China (51:12) - Taxes (55:29) - The muckLinksSFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their workTrump's Executive Order on AI (the AI governance executive order at home)Biden's Executive Order on AICongressional report brief which advises a "Manhattan Project for AI"Non-USACAIRNE resource collection on CERN for AIUK Frontier AI Taskforce report (2023)International interim report (2024)Bengio's paper - AI and Catastrophic RiskDavidad's Safeguarded AI program at ARIAMIT Technology Review article - Four things to know about China’s new AI rules in 2024GovInsider article - Australia’s national policy for ethical use of AI starts to take shapeFuture of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI RegulationTaxesMacroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot TaxCESifo paper - AI, Automation, and TaxationGavTax article - Taxation of Artificial Intelligence and AutomationPerplexity PagesCERN for AI pageChina's AI policy pageSingapore's AI policy pageAI policy in Africa, India, Australia pageOther SourcesArtificial Intelligence Made Simple article - NYT's "AI Outperforms Doctors" Story Is WrongIntel report - Reclaim Your Day: The Impact of AI PCs on ProductivityHeise Online article - Users on AI PCs slower, Intel sees problem in unenlightened usersThe Hacker News article - North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedInFuturism article - Character.AI Is Hosting Pedophile Chatbots That Groom Users Who Say They're UnderageVice article - 'AI Jesus' Is Now Taking Confessions at a Church in SwitzerlandPolitico article - Ted Cruz: Congress 'doesn't know what the hell it's doing' with AI regulationUS Senate Committee on Commerce, Science, and Transportation press release - Sen. Cruz Sounds Alarm Over Industry Role in AI Czar Harris’s Censorship Agenda
    Más Menos
    58 m
  • The End of Scaling?
    Nov 19 2024
    Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".(00:23) - Hot off the press (01:49) - The end of scaling (10:50) - "Useful tools" and "agentic" "AI" (17:19) - The end of quantization (25:18) - Hedging (29:41) - The end of upwards mobility (33:12) - How to grow an economy (38:14) - Transformative & disruptive tech (49:19) - Finding the meaning (56:14) - Bursting AI bubble and Trump (01:00:58) - The muckLinksThe Information article - OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements SlowsBloomberg [article] - OpenAI, Google and Anthropic Are Struggling to Build More Advanced AIReuters article - OpenAI and others seek new path to smarter AI as current methods hit limitationsPaper on the end of quantization - Scaling Laws for PrecisionTim Dettmers Tweet on "Scaling Laws for Precision"Empirical AnalysisWU Vienna paper - Unslicing the pie: AI innovation and the labor share in European regionsIMF paper - The Labor Market Impact of Artificial Intelligence: Evidence from US RegionsNBER paper - Automation, Career Values, and Political PreferencesPew Research Center report - Which U.S. Workers Are More Exposed to AI on Their Jobs?ForecastingNBER/Acemoglu paper - The Simple Macroeconomics of AINBER/Acemoglu paper - Harms of AIIMF report - Gen-AI: Artificial Intelligence and the Future of WorkSubmission to Open Philanthropy AI Worldviews Contest - Transformative AGI by 2043 is <1% likelyExternalities and the Bursting BubbleNBER paper - Bubbles, Rational Expectations and Financial MarketsClayton Christensen lecture capture - Clayton Christensen: Disruptive innovationThe New Republic article - The “Godfather of AI” Predicted I Wouldn’t Have a Job. He Was Wrong.Latent Space article - $2 H100s: How the GPU Rental Bubble BurstOn ProductizationPalantir press release on introduction of Claude to US security and defenseArs Technica article - Claude AI to process secret government data through new Palantir dealOpenAI press release on partnering with Condé NastCandid Technology article - Shutterstock and Getty partner with OpenAI and BRIAE2BStripe agentsRobopairOther SourcesCBS News article - Google AI chatbot responds with a threatening message: "Human … Please die."Biometric Update article - Travelers to EU may be subjected to AI lie detectorTechcrunch article - OpenAI’s tumultuous early years revealed in emails from Musk, Altman, and othersRichard Ngo Tweet on leaving OpenAI
    Más Menos
    1 h y 7 m
  • US National Security Memorandum on AI, Oct 2024
    Nov 6 2024

    October 2024 saw a National Security Memorandum and US framework for using AI in national security contexts. We go through the content so you don't have to, pull out the important bits, and summarize our main takeaways.

    • (00:48) - The memorandum
    • (06:28) - What the press is saying
    • (10:39) - What's in the text
    • (13:48) - Potential harms
    • (17:32) - Miscellaneous notable stuff
    • (31:11) - What's the US governments take on AI?
    • (45:45) - The civil side - comments on reporting
    • (49:31) - The commenters
    • (01:07:33) - Our final hero
    • (01:10:46) - The muck


    Links
    • United States National Security Memorandum on AI
    • Fact Sheet on the National Security Memorandum
    • Framework to Advance AI Governance and Risk Management in National Security

    Related Media

    • CAIS Newsletter - AI Safety Newsletter #43
    • NIST report - Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile
    • ACLU press release - ACLU Warns that Biden-Harris Administration Rules on AI in National Security Lack Key Protections
    • Wikipedia article - Presidential Memorandum
    • Reuters article - White House presses gov't AI use with eye on security, guardrails
    • Forbes article - America’s AI Security Strategy Acknowledges There’s No Stopping AI
    • DefenseScoop article - New White House directive prods DOD, intelligence agencies to move faster adopting AI capabilities
    • NYTimes article - Biden Administration Outlines Government ‘Guardrails’ for A.I. Tools
    • Forbes article - 5 Things To Know About The New National Security Memorandum On AI – And What ChatGPT Thinks
    • Federal News Network interview - A look inside the latest White House artificial intelligence memo
    • Govtech article - Reactions Mostly Positive to National Security AI Memo
    • The Information article - Biden Memo Encourages Military Use of AI

    Other Sources

    • Physical Intelligence press release - π0: Our First Generalist Policy
    • OpenAI press release - Introducing ChatGPT Search
    • WhoPoo App!!
    Más Menos
    1 h y 16 m
adbl_web_global_use_to_activate_T1_webcro805_stickypopup