🎤 Google DeepMind launches real-time, multilingual, emotion-aware Text-to-Speech
Google, OpenAI, Mistral, and Manus push boundaries in voice, code, and video AI while NVIDIA boosts model speeds and Claude expands tool access.
Read time: 9 min
📚 Browse past editions here.
( I publish this newletter daily. Noise-free, actionable, applied-AI developments only).
⚡In today’s Edition (4-Jun-2025):
🔥 Google DeepMind launches real-time, multilingual, emotion-aware audio dialog & fully controllable TTS with Gemini 2.5
🚨 Manus launches text-to-video tool to rival Sora, 1:1 pricing with OpenAI
📡NVIDIA Blackwell is serving models >5X faster than Hopper based endpoints.
🚨 Mistral launches Mistral Code: enterprise-first coding assistant with 4 AI models, on-prem support, and full-stack control.
🛠️ OpenAI’s Codex gets internet: Installs packages, fetches data, runs tests. Memory feature rolls out to free users.
🗞️ Byte-Size Briefs:
Google adds public link sharing to NotebookLM for AI research.
Claude Pro enables app integrations with remote MCP and tools.
Claude Pro plan now includes Claude Code usage and limits.
Yann LeCun reiterates scaling LLMs won't achieve human-level AI.
Ollama introduces toggleable “thinking” mode for step-by-step reasoning.
Google launches Vertex AI Ranking API to rerank noisy RAG output.
🔥 Google DeepMind launches real-time, multilingual, emotion-aware audio dialog & fully controllable TTS with Gemini 2.5
→ Gemini 2.5 now supports native real-time audio dialog with high expressivity, fast response, and style control — users can change tone, accent, even whisper via natural prompts.
→ It supports 24+ languages and can seamlessly switch languages mid-sentence. It also responds contextually to user emotion, tone, and background audio, only replying when it’s relevant.
→ Tool calling is integrated into voice interactions. This means Gemini can fetch real-time data (like from Google Search) or use custom tools during a conversation.
→ The model understands and discusses streaming audio and video feeds in real-time, useful for screen sharing or live video-based interaction.
→ On the text-to-speech side, Gemini 2.5 allows dynamic generation of long-form expressive speech. Users can dictate pace, pronunciation, emotional tone, and speaker accent.
→ Supports multi-speaker audio generation, useful for building podcast-like dialogues or narrative audio content from text input.
→ Two variants: Pro (high quality, complex prompts) and Flash (cost-efficient, faster). Both are in preview via Google AI Studio and Vertex AI.
→ All generated audio embeds SynthID watermark for traceability. Safety assessments include internal red-teaming and external evals to catch misuse or bias.
→ Developers can now use these features through the Gemini API. Real-time dialog is available in the “stream” tab, and controllable TTS in the “generate media” tab.
🚨 Manus launches text-to-video tool to rival Sora, 1:1 pricing with OpenAI
→ Manus has launched a text-to-video generation feature targeting direct competition with OpenAI's Sora. Users input simple text prompts, and Manus converts them into full videos in minutes.
→ The tool is now in early access for Plus, and Pro users. The Pro plan costs $199/month—nearly identical to Sora's $200/month via ChatGPT.
→ Unlike Sora, which is bundled into ChatGPT access, Manus offers a tiered subscription model. This gives users more pricing flexibility depending on usage needs.
→ No specific benchmarks, resolution specs, or duration limits have been disclosed by Manus yet, but the feature promises coherent video structuring from text alone.
→ Manus positions this launch as part of the broader shift toward AI-powered video production, aimed at creators who want fast, low-effort content generation.
→ Expansion to non-paying users is planned, which could widen access and pressure competitors to adjust pricing or feature sets.
→ As AI video tooling heats up, Manus is clearly trying to undercut or match OpenAI’s dominance by offering competitive pricing and accessible UX.
📡NVIDIA Blackwell is serving models >5X faster than Hopper based endpoints.
NVIDIA Blackwell is serving models >5X faster than Hopper based endpoints, as per Artificial Analysis.
→ NVIDIA’s Blackwell GPUs (GB200), uses HBM3e memory and a 10 TB/s chip-to-chip interconnect—over 11x faster than Hopper’s 900 GB/s. FP4 precision
→ Compared to Hopper (H100), which launched in 2022 with 80 billion transistors and 4 petaflops AI throughput, Blackwell pushes up to 20 petaflops. Blackwell also adds a decompression engine delivering up to 900 GB/s data processing, missing in Hopper.
→ The second-gen Transformer Engine in Blackwell supports FP4, enabling higher throughput with lower precision, while maintaining model accuracy. Hopper supported FP8 through its first-gen Transformer Engine.
→ Hopper’s H100 GPU supports 188GB of HBM3 memory and FP64 for HPC workloads, focused more on inference and training of LLMs like Llama 3, delivering up to 60 teraflops FP64 performance. It integrated Confidential Computing for secure AI processing.
→ Blackwell further enhances Confidential Computing with TEE-I/O, introduces better quantum simulation capabilities, and is tailored for high-security, high-throughput environments. It’s 25x more energy-efficient than Hopper.
→ NVIDIA's Blackwell platform also dominated MLPerf Training v5.0, becoming the only platform to submit on every benchmark — covering LLMs, recommender systems, multimodal models, object detection, and GNNs.
→ It was the sole submitter for the most compute-heavy test: Llama 3.1 405B pretraining. Blackwell achieved 2.2x better performance at the same scale over the prior generation.
→ On the Llama 2 70B LoRA fine-tuning task, DGX B200 systems with 8 Blackwell GPUs delivered 2.5x speedup versus earlier submissions using the same GPU count.
🚨 Mistral launches Mistral Code: enterprise-first coding assistant with 4 AI models, on-prem support, and full-stack control.
→ Mistral Code combines four AI models—Codestral (autocomplete), Codestral Embed (search), Devstral (agentic coding), and Mistral Medium (chat)—into a single coding assistant stack built for enterprise environments.
→ Unlike typical SaaS copilots, it runs locally or in secure clouds, keeping code inside enterprise boundaries. It’s fine-tunable and customizable—no API lock-in, full model control.
→ Private beta supports VSCode and JetBrains. Integrates via in-IDE assistant and CLI tools. Can complete full tickets: generate modules, update tests, execute shell commands—with configurable approval flows.
→ Enterprises get full stack ownership: Mistral provides the models, infra, plugins, observability, SLAs, and support. Admins get a console for RBAC, audit logs, seat management, and usage analytics.
→ Supports 80+ programming languages and understands Git diffs, terminal output, and issue contexts. Designed for actual multi-step dev tasks—not just autocomplete.
→ Capgemini, SNCF, and Abanca are already deploying Mistral Code at scale, including serverless and hybrid setups across thousands of devs, including air-gapped environments.
→ Built on Continue OSS foundation, but extended with enterprise controls and planned upstream contributions.
🛠️ OpenAI’s Codex gets internet: Installs packages, fetches data, runs tests. Memory feature rolls out to free users.
⚙️ The Details
→ OpenAI just upgraded Codex inside ChatGPT to support internet access during task runs. That means Codex can now install dependencies, run external tests, fetch data via APIs—basically act like a real dev environment. But it’s off by default—you’ll need to enable it in environment settings.
→ Internet access includes strict environment controls. You can whitelist specific domains and restrict HTTP methods to prevent abuse or data leaks. This keeps the feature usable in secure org settings.
→ Codex also now handles follow-ups better—continuing from an existing pull request instead of opening new ones for each step. This aligns better with real-world dev workflows and CI/CD practices.
→ Voice dictation is now supported directly inside ChatGPT—say your task and Codex writes code for it. This simplifies prompt writing especially for mobile or multitasking users.
→ The Codex rollout is live for ChatGPT Plus, Pro, and Team users. Enterprise access is still pending. Usage is generous but may face rate limits during peak hours.
→ Separately, ChatGPT’s memory feature—previously limited to paying users—is rolling out to free-tier users. It tracks recent chat history for more context-aware replies. But long-term memory (across sessions) still requires a Plus or Pro plan.
→ Free users can disable memory or use Temporary Chat mode to avoid storing session data. In the EEA, users must manually opt in via settings under Personalization > Memory.
🗞️ ChatGPT memory improvements are now rolling out to logged-in free users
Learn more about managing memory in ChatGPT.
ChatGPT memory stores useful details from conversations to make replies more relevant. It works in two ways:
Reference saved memories: User tells ChatGPT to remember specific facts (for example, dietary preferences). These details stay until the user deletes them.
Reference chat history: ChatGPT looks at recent chats for context. It doesn’t store every detail, just what’s needed to improve responses.
Users manage these settings in Settings > Personalization > Memory.
Users can turn memory on or off in settings. Turning off saved memories also disables chat history reference. Deleting a chat does not automatically erase saved memories. To remove a memory completely, the user must delete it in settings.
ChatGPT now gives free users short-term memory. It references recent conversations for more personalized responses while letting users control and delete stored data. Plus and Pro users get longer memory context. Users can toggle saved memories and chat history settings anytime.
Users can delete individual memories or turn off memory. Deleting chat does not remove saved memory until manually cleared.
🗞️ Byte-Size Briefs:
Google’s NotebookLM now lets you share your AI-powered research notebooks with anyone through a simple public link.
Claude Pro users now get access to research and integration features. So you can build custom Integrations for any app or tool with remote MCP. Or integrate pre-built servers like Zapier and Asana. Once connected, Claude can take action across your tools: creating tasks, updating documents, and triggering workflows. So that means Claude can search across the web, your Google Workspace, and any connected tools—getting you insights that draw from all your sources.
Also, Claude Code is now included in the Claude Pro plan which means Pro plan subscribers can use their rate limits for Claude apps and Claude Code!
Yann LeCun again says with extreme conviction that scaling current LLMs won’t get us to human-level AI, even in the next few years. Says “It's system with a gigantic memory and retrieval ability not not a system that can invent solutions to to new problems, which is really what a PhD is. "
ollama adds toggleable "thinking" mode for models like DeepSeek R1 and Qwen 3—faster answers or step-by-step reasoning, your call. Thinking mode, when enabled (--think or /set think), shows a step-by-step reasoning segment followed by the answer. This is useful when transparency or explanation is needed, such as multi-step logic, debugging, or showing model reasoning in applications like NPCs or teaching tools. When disabled (--think=false or /set nothink), models skip the intermediate steps and just give the final output. This is optimized for speed.
Google released the Vertex AI Ranking API to fix noisy search and RAG outputs—where up to 70% of retrieved content often lacks relevance. It works as a semantic reranker, optimizing the last step in retrieval by reordering passages using deep semantic understanding.
That’s a wrap for today, see you all next time.