👨‍🔧 Andrej Karpathy says “I’ve never felt this much behind as a programmer “

Karpathy on feeling behind, AI agents and memory, Gemini’s traffic surge, NVIDIA’s $20B Groq move, and why Google’s keeping TPUs private.

Dec 27, 2025

Read time: 8 min

📚 Browse past editions here.

( I publish this newletter daily. Noise-free, actionable, applied-AI developments only).

⚡In today’s Edition (27-Dec-2025):

👨‍🔧 Andrej Karpathy says “I’ve never felt this much behind as a programmer “
🧠 Memory in the Age of AI Agents
🏆Gemini’s web traffic is soaring.
🛠️ The hardware reason why NVIDIA paid $20 billion for Groq license
💨 OPINION: Google’s not going to sell TPUs broadly. Doing that would turn their in-house architectural lead into a public commodity

Connect with me on X (Twitter)

👨‍🔧 Andrej Karpathy says “I’ve never felt this much behind as a programmer “

Andrej Karpathy posted on X, which struck a chord with millions of programmers and industry practitioners, sparking intense discussions. Karpathy admitted, “I’ve never felt so left behind as a programmer as I do now.”

Karpathy wrote that modern programming is getting reshaped because developers now spend more time directing models, checking outputs, and patching mistakes than typing code line by line.

He mentioned that the programming field is going through a total shake-up. Developers now write far less code and spend more time connecting and managing different tools. Those who can really harness the new stuff that appeared over the past year can be 10 times more effective, but anyone who falls behind will end up stressed about their skills.

There’s now a whole new programmable abstraction layer to learn—things like agents, subagents, prompts, context, memory, patterns, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, and IDE integration.

On top of that, programmers need a solid mental model to make sense of AI models that are random, imperfect, hard to reason about, and constantly evolving. These unpredictable systems are now tightly woven into traditional engineering methods.

As Karpathy said, it’s like everyone suddenly got an incredibly powerful alien gadget without any instructions. People have to figure it out on their own, and this shift has hit the industry like a magnitude 9 earthquake.

He writes “I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. “

🧠 New paper covers Memory in the Age of AI Agents

This 102-page survey unifies agent memory research with 3 lenses, Forms, Functions, and Dynamics.

It replaces vague short-term vs long-term labels with mechanisms that explain how agents store, use, and change memory. Agent memory is a persistent read-write state across tasks, not just retrieval-augmented generation (RAG) over static documents.

Forms split memory into token-level stores, parametric memory in weights or adapters, and latent memory in hidden states like key-value (KV) caches. Functions split it into factual, experiential, and working memory, and Dynamics tracks formation, evolution, and retrieval as separate control problems.

The survey catalogs benchmarks and open frameworks, and it highlights a 2025 shift toward reinforcement learning (RL) to learn memory write, prune, and route policies. For builders, the framework is a checklist for choosing what to store, where it lives, and how it updates under token budgets.

🏆 Gemini’s web traffic is soaring.

Market share jumped from 13.7% to 18.2% in a month. Gemini 3 is clearly doing the work, and Gemini 3 Flash is just warming up. And that share went from 5.4% to 18.2% over 12 months.

That’s not small movement, that’s a clear shift in control. Almost 13 points gained in a market long associated with OpenAI. A big driver of Gemini’s jump is distribution, because Gemini can be placed inside Google surfaces people already open, which turns casual queries into counted visits.

On the other hand, ChatGPT’s share fell from 87.2% to 68.0% in the same window, a 19 point drop. If web usage reflects user choice at scale, then Google is firmly winning share.

Also to note, ChatGPT can lose share even while growing, because the denominator (total GenAI web visits across all tools) is exploding as more tools and more usage enter the bucket. Meanwhile, Microsoft Copilot stayed nearly flat, slipping from 1.5% to 1.2%. However, Copilot’s “flat” web share is not a clean verdict on adoption, because lots of Copilot usage can happen inside native apps and enterprise flows that never look like a public web visit.

🛠️ The hardware reason why NVIDIA paid $20 billion for Groq license

First a quick note on GPU, TPU and Groq’s LPU.

GPU

Built for graphics first, the GPU crunches thousands of pixels in parallel. For AI, it treats an LLM like one huge parallel job.
The bottleneck is HBM sitting off the chip. Every token means fetching weights from external memory, which creates a memory wall where the cores wait on data.
The logic is a hub-and-spoke design. It is super flexible and can handle training and gaming, but it is not perfectly efficient for the step-by-step nature of text generation.

TPU

A TPU is an ASIC focused on tensor math. It uses a systolic array, like a heart pushing data through a processor grid. Data flows from one unit to the next without round trips to main memory.
The logic is high efficiency on massive batches, so it shines in training and heavy inference. For a single user question, it can still run into latency.

Groq LPU

Groq’s LPU takes a different path. No HBM. It uses on-chip SRAM.
The speed win is big. SRAM can be up to 100x faster than HBM, so there is basically zero fetch time.
GPUs schedule work probabilistically in hardware. LPUs are deterministic. The compiler sets where every piece of data will be at each moment, like a tightly timed assembly line. Groq actually built the automated compiler first, then the chip, because Jonathan, who worked on Google’s TPU, knew software was the pain point and a startup could not match 10k Nvidia kernel engineers. So there are no manual kernel tweaks on LPUs; each token’s route is fixed ahead of time.
Where it excels is LLM inference. Text is produced one token at a time, and the LPU streams those tokens on its conveyor-style design, so Groq hits hundreds of tokens per second while many GPUs struggle around 50.

AI inference has 2 steps, Prefill & Decode.

Prefill means the model reads your whole prompt and context. Decode means it writes the reply one small chunk of text at a time.

These 2 steps like different hardware. Prefill benefits from large memory capacity, so it can hold long context, speed matters less. Decode benefits from extremely fast memory bandwidth and very low delay, memory size matters less.

Now Groq’s LPU is a complete departure from the GPU/TPUs. It doesn’t use HBM (External Memory) at all. Instead, it uses SRAM (Static Random Access Memory), which is built directly into the silicon of the chip.

Groq’s big advantage is how quick it runs single user inference. Thanks to its compute layout and only-local memory, it hits one of the fastest single user tokens per second rates on the market.

SRAM is up to 100x faster than the HBM found in GPUs. Because the data is right there on the chip, there is zero “fetch time.”

The downside of the chip is that it has no external DDR or HBM memory - only onboard SRAM. While fast, the 230 MB capacity per chip has been super low, and implies that a reasonably small open source model like Llama 70B requires 10 racks of processors and over 100 kW of power to run.

Groq’s LPUs don’t need liquid cooling, and that’s a pretty big deal. Most data centers around the world still use air cooling, not liquid. Nvidia’s Blackwell and upcoming chips, on the other hand, will mainly rely on liquid cooling since they’re built for top performance.

Nvidia’s 3 Rubin chips line up with that. Rubin is the main GPU with HBM4, very high bandwidth, used for training and high-throughput decode. It is usually a dual-die part with NVLink and large HBM capacity.

Rubin CPX is a sibling tuned for prefill. It is single-die, uses 128GB GDDR7, about 30 PFLOPS NVFP4, cheaper and cooler, and accelerates long-context attention. It trades bandwidth for capacity and cost to make prefill efficient.

Rubin with HBM is the balanced workhorse for training and high throughput inference. Nvidia’s plan is to run prefill on CPX, then hand off decode to standard Rubin, all orchestrated in the same rack. That is the connection. They are designed to work together.

A Groq-style Rubin with SRAM will be very fast for token-by-token decode, but holds less. Systems can use CPX or normal Rubin to do prefill, then hand off to the SRAM Rubin for the fast typing part.

Result, you get faster answers for interactive apps while keeping costs sensible by using the right chip for each step.

After the deal, Groq will remain an independent company, hold the IP, and service GroqCloud (it’s online neocloud business) with all the middle-eastern deals it has done over the years.

Now, about competition. Nvidia understands that if the HBM, energy, liquid cooling, and CoWoS limits choke the market and lead to a serious compute shortage, both customers and rivals will hunt for workarounds. In that situation, Groq, which doesn’t depend on the same supply chain constraints, becomes an obvious alternative.

My bet is almost every ASIC gets shelved except TPU, AI5, and Trainium. Taking on the 3 Rubin variants plus their networking chips will be rough. OpenAI’s ASIC sounds better than expected and ahead of Meta’s and Microsoft’s.

Connect with me on X (Twitter)

💨 OPINION: Google’s not going to sell TPUs broadly. Doing that would turn their in-house architectural lead into a public commodity

A lot of the “Google might sell TPUs” talk exists because, there are already pretty credible signals that Google has started treating TPU as a commercial product, not just an internal advantage. Google is also as running a big internal push called TorchTPU to make TPUs work cleanly with PyTorch, and it also says Google has begun selling TPUs directly into customers’ data centers, not only renting them inside Google Cloud.

Once you accept that premise, the “selling TPUs” story stops meaning “Google becomes a commodity chip shop like Nvidia.” It starts meaning “Google sells a controlled version of the TPU experience.” i.e. Setups that look more like Google-managed services where the customer gets TPU hardware on their side, but Google stays involved with operations and the software stack. That kind of model is a way to get external adoption without fully giving away every internal trick, because you are not just shipping a chip in a box and walking away.

Think of Google as 3 groups that share the same expensive TPU factory.

Search and DeepMind want TPUs mainly as an internal weapon. They want cheap, reliable compute so they can ship better models, defend ads, and not pay Nvidia prices. Google Cloud is judged on a different scoreboard. Cloud has to show customers, revenue growth, and that all the money spent building data centers and chips turns into cash, not just “internal advantage for other teams.”

So Google Cloud is sitting on a huge bill for AI infrastructure. If they only use it internally, that spend looks like a cost center from the cloud business point of view. If they sell TPU capacity (or TPU systems) to big outside buyers, that same infrastructure suddenly looks like a product that earns money, helps cloud hit growth targets, and helps justify building even more capacity.

There’s also a very practical scaling logic: external demand can help justify building a much larger TPU fleet than Google could comfortably build just to feed its own internal teams. You can see how big these numbers get from deals that are already public. Oct-25, Anthropic said it plans to expand usage of Google Cloud tech including up to 1mn TPUs, and it described the expansion as worth tens of billions of dollars, with well over 1 gigawatt of capacity expected online in 2026. If you are Google, deals like that are basically a financing mechanism for “build more TPU capacity than we could justify purely for ourselves,” which weakens the “every TPU sold is a TPU stolen from internal teams” argument.

Finally, a bunch of speculation is just “follow the switching-cost work.” Google wouldn’t put serious effort into making TPUs feel native to PyTorch if the plan was to keep TPUs mostly internal forever. TorchTPU is explicitly about lowering the biggest adoption blocker: most of the world’s AI codebases are built around PyTorch, and Nvidia’s advantage is heavily tied to that software ecosystem. If Google can make “PyTorch on TPU” feel easy, then selling TPUs (or TPU systems) becomes way more realistic commercially, and it also gives Google leverage against Nvidia even for workloads Google keeps internal.

If Google sells TPUs to outside customers at real scale, 3 things get messy fast.

First, they have to price TPUs against Nvidia GPUs. The moment they do that, pricing starts acting like a window into Google’s real costs. People can back-calculate what TPU compute likely costs Google internally, so the “Google has secret magic economics” story fades.

Second, selling bare-metal TPUs forces Google to publish way more detail: hardware specs, real benchmarks, and the full developer stack people need to run production workloads. That turns TPU from an internal advantage into a documented product that competitors can study, copy patterns from, and slowly replicate.

Third, it undercuts Google Cloud itself. Google Cloud already sells TPU access at a premium as a managed service. If customers can just buy TPUs directly and operate them, the biggest buyers will avoid the cloud markup and run them cheaper on their own infrastructure. That means Google ends up competing with its own higher-margin cloud business.

That’s a wrap for today, see you all tomorrow.

Connect with me on X (Twitter)

David Moser

Dec 28

Karpathy sounds like a gaslighting victim. First the AI bros tell us we're close to AGI and the Bots are at PhD level. Then, when the tools randomly break it's our fault for not instructing them correctly.

If I had a PhD level worker in my company who consistently makes mistakes and then blames it on my instructions I'd let them go.

We need improvements in the tool, like asking questions when the prompt is missing information or when a clarification may simplify the solution. We don't need more prompt engineering.

Dan McRae

Dec 27

you make a lot of complicated things clear in this article. thanks

Rohan's Bytes

Discussion about this post

Ready for more?