🧠 GPT-5 System Card: GPT-5.2 - Key findings

GPT-5.2 updates, Zoom’s AI beats GPT-4 on MATH, live translation via Gemini hits headphones, TIME names AI builders Person of the Year, Disney bets $1B on OpenAI.

Dec 13, 2025

Read time: 7 min

📚 Browse past editions here.

( I publish this newletter daily. Noise-free, actionable, applied-AI developments only).

⚡In today’s Edition (13-Dec-2025):

🧠 GPT-5 System Card: GPT-5.2 - Key findings
🏆 “The Architects of AI Are TIME’s 2025 Person of the Year”
⚙️ Zoom says its federated AI scored 48.1% on Humanity’s Last Exam, beating the prior best 45.8% by 2.3% as shown in the table.
🗣️ Google Translate now lets users listen to live translations straight through their headphones with Gemini translation capabilities.And also a headphone “Live translate” beta.
🏰 Disney Invests $1B in a major deal with OpenAI

Connect with me on X (Twitter)

🧠 GPT-5 System Card: GPT-5.2 - Key findings

GPT-5.2 System Card - some key findings

In real production traffic, deception for gpt-5.2-thinking dropped to 1.6%, down from 7.7% for gpt-5.1-thinking, meaning it lies or misrepresents tool use far less often in those monitored samples.
On an adversarial deception setup, gpt-5.2-thinking dropped from 11.8% to 5.4%, showing better resistance when prompts are designed to tempt misleading behavior.
gpt-5.2-instant and gpt-5.2-thinking nearly saturated known prompt-injection tests (Agent JSK 0.997 and 0.978), but the report warns these are known-attack splits and may overstate robustness to brand-new attacks.
On disallowed-content “Production Benchmarks,” gpt-5.2-thinking improved notably on mental health (0.915 vs 0.684) and emotional reliance (0.955 vs 0.785), which the card highlights as standout gains versus GPT-5.1.
Internal testing found gpt-5.2-instant refused fewer mature sexualized text requests for adults, while claiming no change to what is disallowed and no added access for minors.
OpenAI is rolling out an age prediction model to automatically apply stronger protections for accounts believed to be under 18, including reduced access to sensitive categories like gore and sexual or romantic role play.
For factuality, gpt-5.2-thinking is on par or slightly better than predecessors on production-like prompts, and with browsing enabled it achieved below 1% hallucination across 5 topic domains.
Under the Preparedness Framework, OpenAI is treating gpt-5.2-thinking as “High capability” in the biological and chemical domain and activating safeguards, while also saying it does not have definitive evidence it can help a novice cause severe biological harm and is “on the cusp.”
On AI self-improvement, gpt-5.2-thinking is the top model on “OpenAI PRs,” surpassing gpt-5.1-codex-max, and is comparable to gpt-5.1-codex-max on MLE-bench while being 1 percentage point below it on PaperBench.
An external evaluation by Apollo Research concluded gpt-5.2-thinking showed low rates of covert subversion and no sabotage or self-preservation behaviors in their tests, and they judged it unlikely to cause catastrophic harm via scheming.

🏆 “The Architects of AI Are TIME’s 2025 Person of the Year”

During this year, AI shifted from “chat in a box” to “do work”, by adding tool use, memory, and connections to things like email and calendars.

Some excerpts below.

“2025 was the year AI became industrial infrastructure”. “Every industry needs it, every company uses it, and every nation needs to build it,” ~ Jensen Huang tells to TIME Magazine.

“OpenAI, which ignited the boom, continues to set the pace in many ways. Usage of ChatGPT more than doubled, to 10% of the world’s population. “That leaves at least 90% to go,” says Nick Turley, head of ChatGPT. “

“the revolution had arrived before the public was ready. Multiple polls find that Americans are worried about AI, and would prefer the technology to be developed safely, even if that means slowing down. One Pew Research Center survey in September found that Americans believe AI will worsen, not better, our abilities to think creatively, form meaningful relationships with other people, make difficult decisions, and solve problems.”

“Trump and his tech allies are even attempting to stop states from issuing their own AI regulations—which has drawn some fierce pushback even from GOP leaders. “Is it worth killing our own children to get a leg up on China?” Missouri Senator Josh Hawley, who recently introduced a bill to ban minors from using chatbot.”

Where AI spending is going.

How people use ChatGPT

AI companies to know.

⚙️ Zoom says its federated AI scored 48.1% on Humanity’s Last Exam, beating the prior best 45.8% by 2.3% as shown in the table.

zoom’s federated AI is a system that uses multiple models as parts of a pipeline. Basically,aA federated setup changes the unit of work from “1 model produces 1 answer” to “several models play different roles, then the system chooses or merges what survives checks.”

The models can call search or other helpers, and Zoom says it then compares, critiques, and verifies outputs across multiple models. A federated system can route different kinds of questions to different models, for example a small fast model for simple summarization, a stronger reasoning model for hard logic, and a tool heavy model for tasks that need search or code, then it stitches results together.

Humanity’s Last Exam is a hard knowledge and reasoning test, so a higher score means more correct answers on the full set. Zoom describes an explore, verify, federate loop that tries multiple promising paths, checks them against the full context, then picks the most consistent answer using a scoring step called Z-scorer. Zoom ties this benchmark bump to AI Companion 3.0 work like meeting summaries, cross-app retrieval, and multi-step workflow automation.

Connect with me on X (Twitter)

🗣️ Google Translate now lets users listen to live translations straight through their headphones with Gemini translation capabilities.And also a headphone “Live translate” beta.

The big deal here is that Translate should carry over meaning and speaking style, not just swap words between languages.

Gemini improves text translation by using the surrounding words to figure out what a phrase really means, so idioms, slang, and local expressions stop getting translated in a weird literal way. Google says this is rolling out in the US and India for English paired with nearly 20 languages across the Translate app on Android, iOS, the web, and also inside Google Search.

The new “Live translate” beta is for when 2 people are talking and 1 person wants to hear a real-time translation through headphones, so any headphones start acting like a 1-way translator. Google says it tries to keep the speaker’s tone (how emotional they sound), emphasis (which words they stress), and cadence (their rhythm and pacing), so it feels less like a robotic voice and more like the original speaker.

This beta is rolling out on Android in the US, Mexico, and India, it works with any headphones, and it supports 70+ languages, with iOS and more countries planned in 2026. Translate’s “practice” mode is also expanding to nearly 20 new countries, adding more tailored speaking feedback and a day-streak tracker so people can see consistency over time. This looks like a real product push where Google is treating translation as “meaning plus voice,” not just text conversion, and the headphone feature could be the part people feel instantly.

🏰 Disney Invests $1B in a major deal with OpenAI

Licenses Disney characters to advance human-centered AI and storytelling

They signed a 3-year deal that lets Sora generate short, prompt-made social videos using a licensed set of 200+ Disney, Marvel, Pixar, and Star Wars characters, plus Disney will put $1B into OpenAI. Disney also says some fan-made Sora videos will be curated for streaming on Disney+, and the first licensed character generation is expected to start in early-26.

So now with this deal Sora adds a legal “permission layer” so the model can intentionally generate specific copyrighted characters, costumes, props, vehicles, and familiar settings without treating it like an accidental look-alike. The same license also applies to ChatGPT Images, so users can type a few words and get fully generated images using the same character set, but Disney says the deal excludes talent likenesses and voices.

Disney says it will also become a major OpenAI customer, using OpenAI application programming interfaces to build new tools for products like Disney+, and it will deploy ChatGPT internally for employees. Disney gets equity plus warrants of OpenAI, while OpenAI gets a marquee content partner that can set a pattern for how “licensed generation” might work for other studios. Overall, this looks like Disney choosing a paid, controlled path for generative character content instead of fighting every copycat output one-by-one.

That’s a wrap for today, see you all tomorrow.

Connect with me on X (Twitter)

Rohan's Bytes

Discussion about this post

Ready for more?