🤖 AI ‘godmother’ Dr Fei-Fei Li writes a viral post on spatial intelligence

Fei-Fei Li spotlights spatial intelligence, Backboard sets memory benchmark record, Anthropic undercuts OpenAI, and AI data centers scale into ultra-mega builds.

Nov 11, 2025

Read time: 8 min

📚 Browse past editions here.

( I publish this newletter daily. Noise-free, actionable, applied-AI developments only).

⚡In today’s Edition (11-Nov-2025):

🤖 AI ‘godmother’ Dr Fei-Fei Li writes a viral post on spatial intelligence
🚧 AI data centers are scaling into ultra-mega projects.
📡 Backboard hits a record 90.1% on LoCoMo Benchamark (that checks AI’s Long term conversational memory), setting a new bar for how AI handles memory tasks.
🛠️ Anthropic Offers Cost Advantage Over OpenAI in 2025 LLM Market

Connect with me on X (Twitter)

🤖 AI ‘godmother’ Dr Fei-Fei Li writes a viral post on spatial intelligence

The next big step for AI is the spatial intelligence, built by world models that create consistent 3D worlds, fuse many signals, and predict how those worlds evolve.

LLMs are great with words but weak at grounded reasoning, so models that can represent and interact with space are needed.

Current multimodal models still fail at basics like estimating distance, maintaining object identity across views, predicting simple physics, and keeping video coherent for more than a few seconds.

The proposed answer is world models with 3 core abilities, generative, multimodal, and interactive, so the model can build consistent worlds, take many input types, and roll the state forward after actions.

These models have to keep physics and space consistent so objects act like they would in real life, not just sound correct like text from a chatbot. They need to work with many data types at once—images, sounds, text, and actions—and then decide what to do next in a fast perception to action loop.

They also need to understand how time flows, so they can guess what will happen when things move, like when a robot reaches for something or a person walks through a room. World Labs, the startup Li co leads, is building these Large World Models and their Marble tool is incredible. Its a creator tool that keeps consistent 3D environments that can be explored and edited, lowering the cost of world building.

Training such models needs a universal task function that plays a role like next token prediction but bakes in geometry and physics.

Data must scale beyond text using internet images and videos, plus synthetic data, depth, and tactile signals, with algorithms that recover 3D from 2D frames at scale.

Architectures need 3D/4D aware tokenization, context, and memory, since flattening everything to 1D or 2D makes counting objects or long-term scene memory brittle.

World Labs points to a real-time generator called RTFM, which uses spatially grounded frames as memory to keep persistence while generating quickly.

The near term use is creative tooling where directors, game designers, and YouTubers can block scenes, change layouts, and iterate with natural prompts inside physically coherent sets. The mid term path is robotics, where the same models give robots a predictive map to plan grasps and navigation in homes and warehouses, cutting trial and error. The long term bet is science and simulation, where controllable worlds accelerate testing of designs in materials, biology, and climate by running fast “what if” experiments before real trials.

🚧 AI data centers are scaling into ultra-mega projects.

According to and exhaustive Epoch AI research, by year end, cumulative AI data center spend could pass >$300B, near 1% of US GDP, which tops Apollo at 0.8% and the Manhattan Project at 0.4%.

The headline example is obviously OpenAI’s Stargate Abilene with 1 GW power, >250x GPT-4 cluster compute, 450 soccer fields of land, $32B capex, thousands of workers, and roughly 2 years to build. Power availability picks the map more than latency, since generating a model answer takes >100x longer than sending data from Texas to Tokyo, so even a Moon relay would not be the bottleneck.

Only a handful of countries can support many >1 GW sites, since 30 GW is about 5% of US power, 2.5% of China’s, and 90% of the UK’s. Projects usually start with on-site natural gas for firm capacity, then add grid interconnects to pull large amounts of wind and solar where available.

Inside the buildings, each server rack covers about 0.5 m^2 yet can draw as much electricity as 100 homes, so designs shift from air to liquid cooling to safely move heat. On current trajectories, AI data centers can support roughly 5x/year growth in frontier training compute for the next 2 years without forced decentralization.

Operators might still spread training across regions if they can buy stranded or excess power more cheaply. AI’s power share in the US is near 1% today versus 8% for lighting and 12% for air conditioning, but the mix will change if buildouts keep compounding.

The scale and the visible cooling gear make these projects hard to hide, since thousands of people and satellite imagery leave a trail. Stargate Abilene illustrates the new template, pair firm on-site generation with a large grid hook, optimize for cooling, and chase the cheapest clean megawatts at scale.

📡 Backboard hits a record 90.1% on LoCoMo Benchamark (that checks AI’s Long term conversational memory), setting a new bar for how AI handles memory tasks.

Backboard Achieves Highest Score Ever on LoCoMo (90.1%) Redefining AI Memory Performance. New stateful memory architecture. Standard protocol. Fully reproducible.

It’s a reproducible setup and makes the scripts and API public for anyone to rerun the exact evaluation, checkout the Github. Full Result set with replication scripts completely open-sourced there.

This result targets long term conversational memory, not generic reasoning benchmarks, and it lands as a new high against prior public numbers. The LoCoMo benchmark checks whether a system can remember facts across many chats, track details in long dialogues, and answer time based questions that depend on earlier context.

High LoCoMo means the system stores the right facts, tags them so search can find them fast, and fetches the exact fact when needed across long, messy histories. That means you repeat yourself less, prompts shrink, tool calls drop, and the assistant follows your preferences and past decisions more accurately over weeks of use.

Backboard’s run uses the standard LoCoMo tasks, GPT-4.1 as the judge, fixed prompts and seed, and it publishes prompts, logs, and verdicts for every question to make replication straightforward.

Prior public numbers cluster around 67% to 69% for popular memory libraries, with a simple filesystem baseline near 74%, so 90.1% is a clear jump rather than noise.

So who is Backboard ?

Built by Unicorn founder, Rob Imbeault , Backboard is an AI memory and retrieval platform that gives apps stateful memory, so they can write, update, and recall user facts across long, multi session workflows.

It ships a developer API with 4 pieces, stateful memory, a multi model API that routes across 2,200+ LLMs, stateful thread management, and a RAG layer for hybrid search, all aimed at making assistants remember accurately and act consistently over time.

The multi model API claims unified access to 2,200+ LLMs with routing, analytics, and cost tracking, and it can switch models mid chat while keeping the same memory in place.

As to the LoCoMo benchmark, it tests whether assistants remember, update, and retrieve user facts, preferences, calendar details, and commitments across threads. It covers single hop, multi hop, open domain mixed with general knowledge, and temporal questions. Judging uses fixed prompts and seeds. High scores show reliable write, index, and retrieval, yielding fewer repeats and re prompts, lower token bloat, and improved follow through. Backboard’s 90.1% shows low loss, though multi hop still needs reasoning.

Connect with me on X (Twitter)

🛠️ Anthropic Offers Cost Advantage Over OpenAI in 2025 LLM Market

🧮 Anthropic is signaling a hard tilt toward efficiency, projecting lower compute spend than OpenAI over the next few years while keeping prices aggressive for its models.

Anthropic thinks it can run AI more efficiently than OpenAI.

Anthropic’s internal forecasts peg 2025 compute at ~$6B vs OpenAI’s ~$15B, rising by 2028 to ~$27B vs ~$111B, implying a sizable unit-cost gap in Anthropic’s favor.

That cost posture lines up with Anthropic’s revenue path, which now targets ~$70B in 2028 with cash-flow positive in 2027, leaning on enterprise and API demand.

OpenAI’s topline is still larger on paper, with ~$100B in 2028, but profitability is pushed out as infrastructure burn remains heavy and cash-flow positive is not expected until at least 2029.

The supply side story also fits, since Anthropic is spreading workloads across Google TPUs, Nvidia, and Amazon, including a fresh Google deal for up to 1M TPUs and >1GW capacity by 2026, which can compress cost per token if utilization is high.

And comparing model pricing between OpenAI and Claude, a 1B-token workload, a Haiku-class run can land in low 6 figures, while comparable OpenAI tiers from the Turbo era priced closer to mid 6 figures, and that gap compounds at scale.

If Anthropic keeps this mix of price, context, and chip diversity, it wins more API-heavy enterprise workloads, while OpenAI’s breadth still pulls ahead for frontier features, brand, and ecosystem, and buyers will likely split spend across both to hedge model risk and cost.

The cheaper, more efficient chips may be part of the reason that Anthropic is projecting that it will be profitable in 2027.

The report also notes that OpenAI’s expensive $40 billion “backup” server build-out is part of its plan to eventually monetize hundreds of millions of nonpaying ChatGPT users, while Anthropic is generating 80% of its revenue from paid API access and isn’t spending as much to serve its much smaller base of free users.

That’s a wrap for today, see you all tomorrow.

Connect with me on X (Twitter)

Rohan's Bytes

Discussion about this post