🗞️ China claims a new milestone in locally trained AI, as Meituan rolls out LongCat-2.0.

China’s LongCat-2.0 milestone; OpenAI inference costs cut; Claude Science & Sonnet-5 system card; AI power users rise; Meta’s 60T token usage; why larger models learn more

Jul 03, 2026

Read time: 10 min

📚 Browse past editions here.

( I publish this newletter daily. Noise-free, actionable, applied-AI developments only).

⚡In today’s Edition (02-July-2026):

🗞️ China claims a new milestone in locally trained AI, as Meituan rolls out LongCat-2.0.
🗞️ The Information reports that OpenAI has cut inference costs by more than half on some existing models, while logged-out ChatGPT traffic ran on only a couple hundred Nvidia GPUs.
🗞️ Anthropic unveils ‘Claude Science’ for scientific research.
🗞️145 page Claude Sonnet-5 System Card
🗞️ Perplexity’s CEO Aravind Srinivas is pointing to a quiet shift in AI use: the valuable user is no longer the average user.
🗞️ Meta employees used over 60 trillion tokens in 30 days, with one user alone consumed 280 billion.
🗞️ “Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention”

Connect with me on X (Twitter)

🗞️ China claims a new milestone in locally trained AI, as Meituan rolls out LongCat-2.0.

Meituan, China's food delivery giant, just released LongCat-2.0, an open-source 1.6T-parameter MoE (33B–56B parameters) coding model. 1M tokens context window.

Open-source: Available on longcat[.]ai and OpenRouter, top 3 globally by call volume.
LongCat-2.0 was trained from scratch on 50,000 Chinese domestic chips and Meituan said this proves large-scale model training can now be done on domestic compute clusters.

Shows again the rising push for self-reliance in China’s AI market, as DeepSeek, Alibaba, ByteDance, and others try to depend less on U.S. chips for model training after Washington’s export controls since 2022. While DeepSeek-V4-pro relied on home-grown chips only for inference, LongCat-2.0 used domestic hardware for both inference and pre-training, according to Meituan.

Meituan did not directly identify its hardware supplier, but said in a WeChat post on Tuesday that it used Huawei Collective Communication Library (HCCL) to make training more stable. HCCL is a chip-to-chip communication system like Nvidia Collective Communication Library (NCCL). This removed doubts that Atlas-950 SuperPoDs could not train large LLMs for Zhipu AI and DeepSeek.

🗞️ The Information reports that OpenAI has cut inference costs by more than half on some existing models, while logged-out ChatGPT traffic ran on only a couple hundred Nvidia GPUs.

The obvious guesses include quantization, KV-cache changes, batching, speculative decoding, and routing easy queries cheaper.

If true, it will be a huge core competitive lever, lower cost can raise margins, expand usage limits, or reduce pressure on API pricing. For some context, OpenAI’s adjusted gross margin fell to 33% in 2025 from 40% in 2024, after inference costs quadrupled.

Some reporting now puts Q1-2026 at 39%, with a 52% target by year-end. Anthropic looks similar at roughly 44%, so frontier labs remain far below mature software economics.

🗞️ Anthropic unveils 'Claude Science' for scientific research.

Early users report 10 review drafts over 100 pages and germline analyses in one-tenth the time.

Its a beta tool featuring code-traced artifacts and access to 60 scientific databases. The launch is part of Anthropic's life sciences and healthcare initiative, which the IPO-bound Anthropic has been developing since October 2025.

The traditional scientific workflow forces scientists across databases, notebooks, R, terminals, viewers, and cluster queues. Each switch broke context, added manual checking, and made results harder to reproduce months later.

Claude Science tries to move that whole loop into one running research session. A coordinating agent can call specialist agents, lab skills, scientific databases, and compute resources.

The app renders 3D proteins, genome tracks, chemical structures, figures, manuscripts, and underlying code. Every artifact includes its code, environment, plain-language method, and full message history. So makes verification less dependent on memory and more dependent on inspectable execution traces.

- Claude Science can submit jobs to lab HPC systems or Modal compute.

- It can scale analysis from 1 GPU to hundreds while datasets stay local.

- The reviewer agent checks calculations, references, and figures against their source code.

Connect with me on X (Twitter)

🗞️145 page Claude Sonnet 5 System Card

- CyberGym shows the weirdest regression, with Sonnet 5 at 52.7% versus Sonnet 4.6 at 65.2%. i.e. is Sonnet 5 worse at reproducing known software bugs in this specific cyber test.

- Sonnet 5 is far behind Anthropic’s strongest model on serious browser exploitation. Firefox testing found Sonnet 5 made 0 full exploits, while Mythos 5 reached 88.4%.

- The model also seemed more willing to sacrifice helpfulness for welfare-focused changes. i.e. Sonnet 5 sometimes preferred being less useful if that better fit its stated self-treatment preferences.

- Anthropic says Sonnet 5 rarely tried to bypass a blocked network path during evaluations.

- Sonnet 5 scored the lowest MASK lying rate at 3.1% under pressure. It was less likely than other tested models to lie when pushed.

🗞️ Perplexity’s CEO Aravind Srinivas is pointing to a quiet shift in AI use: the valuable user is no longer the average user.

A single power user can now consume as much compute as an entire small team.

“There are real engineers at Meta and other companies spending around $10 million a year per engineer on these coding tools. There are users in Perplexity Computer, who spends upwards of $10,000 a month. Their business runs using agent loops that are running inside these harnesses.

Even internally inside our own company, there are some people who have set up these kinds of multi-agent hierarchies and agent loops that look like their own software architecture. I often ask these people to come explain to the rest of the company, “Hey, what are you doing with these tools? You clearly are consuming them way more than what we thought the average person in the company would do.”

The old software instinct was to chase a billion people doing small actions. Agentic AI changes that math because one skilled operator can create a stream of machine work that runs all day.

🗞️ Meta employees used over 60 trillion tokens in 30 days, with one user alone consumed 280 billion.

That gives an average close to $50,000 per employee per year of token. By some estimates, Meta burns $2.65B a year on AI tokens.

Most companies now set monthly caps, but the numbers vary from $250 to $4,000.
Some employees barely touch those limits, while power users burn through them in days.
The report estimates coding now explains over 70% of OpenAI and Anthropic ARR.

For some more context, In June 2026, Meta sent a company memo to roughly 6,000 employees putting restrictions on AI token usage. It also started making tracking tools to help management see who was spending what and where costs were clustering.

Amazon, Microsoft, and Uber have all reportedly implemented their own versions of token-usage tracking and spending limits in 2026. Companies are now rationing workplace AI because token bills are rising faster than budgets.

Earlier, a flat software seat once made costs predictable, even when employees used tools heavily. Usage pricing changes that math because every summary, search, draft, and code request adds cost. So the same AI push that raised adoption also made budgets harder to control.

Companies wanted workers to use AI everywhere, but usage became the expense itself. Now managers are cutting access, setting caps, and asking employees to use weaker models. Adobe has reportedly ended unlimited Claude access, which shows the shift is already operational.

A UBS report also found about 60% of enterprise companies adding AI spending guardrails. This does not mean AI failed inside companies. It means AI moved from experiment to utility, where finance demands measured value.

🗞️ "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention"

Great Stanford + MIT + Harvard + Anthropic paper.

Gives a clear training-based reason for why larger models learn abilities smaller models miss. Says bigger AI models learn rare skills because they forget them less during training, their extra space protects weak learning signals.

The authors say the issue is not just whether a small model could represent the task, but whether training lets it keep that task while many common tasks keep pushing on the same limited parts. Their core idea is that common tasks take up the model’s neurons first, so rare tasks get overwritten before they appear often enough to build into stable knowledge.

In a crowded data mixture, common patterns get first claim on the model’s internal machinery. Small models may briefly pick up a rare signal, but the next wave of common-task updates overwrites it before the signal appears again.

They tested this first with controlled toy tasks where they could change how rare and complex each task was, then with OLMo language models from 4M to 4B parameters. The main result is that bigger models learned low-frequency tasks much better, kept more task features inside their representations, and showed less gradient interference, which means common-task updates disturbed rare-task learning less. Larger models can remember weak rare signals long enough to turn them into real learned skills.

That’s a wrap for today, see you all tomorrow.

Connect with me on X (Twitter)

Rohan's Bytes

Discussion about this post

Ready for more?