🧠 AI discovers learning algorithm that outperforms those designed by humans.

AI invents better learning algorithms than humans, Meta slashes 600 AI jobs, Reddit sues Perplexity, Github Repo to convert PDFs to Markdown using DeepSeek-OCR plus China wins in Robotics

Oct 23, 2025

Screenshot of a scientific manuscript abstract page from Nature dated 27 October 2025 titled Discovering state-of-the-art reinforcement learning with authors Junghyo Oh Gefaruk Jurnal Kumar Dan C. Allman Mattew Hosel Luisa Zintgraf Satinder Singh David van Hasselt David Silver. The abstract discusses how humans use powerful RL mechanisms while artificial agents use hand-crafted rules and presents a meta-learned update rule achieving state-of-the-art on Atari with transfer to new benchmarks. Includes a note on preprint status before final publication and applies to legal disclaimers.

Read time: 11 min

📚 Browse past editions here.

( I publish this newletter daily. Noise-free, actionable, applied-AI developments only).

⚡In today’s Edition (23-Oct-2025):

🧠 AI discovers learning algorithm that outperforms those designed by humans.
🏆 Github Resource: Convert PDF documents to Markdown format using DeepSeek-OCR with FastAPI backend.
📡 Reddit sues Perplexity and 3 data brokers for allegedly scraping Reddit conversations without permission in a New York federal filing.
💼 Meta is cutting 600 jobs in its AI org, hitting Fundamental AI Research (FAIR), AI infrastructure, and AI product teams, while sparing the elite TBD Lab that houses many recent hires.
🇨🇳 Deep dive: China is flooding its factories with robots faster than anyone else.

Connect with me on X (Twitter)

🧠 AI discovers learning algorithm that outperforms those designed by humans

“This work has taken a step towards machine-designed reinforcement learning algorithms that can compete with and even outperform some of the best manually-designed algorithms in challenging environments.”

Context:

Just like people, AI learns through trial and error. But usually, humans have to kick things off by building the algorithms and setting the rules that guide how AI learns. Now though, as AI keeps improving, machines are starting to figure more things out on their own. One new AI system even came up with its own learning method and ended up creating an algorithm that beat the ones humans made on several difficult tasks.

For years, it’s been engineers designing the learning methods for AI, especially in reinforcement learning, where an AI gets rewards for making the right moves. While humans and animals have learning built into them through evolution, AI has to be taught step by step. That teaching process can be painfully slow and depends a lot on human creativity, which puts limits on how far it can go.

Inspired by how evolution works through random trial and error, the researchers built a massive digital population of AI agents. Each agent tackled a variety of tasks across many complicated environments, all using one specific learning rule.

And finally the research team conclude “RL algorithms required for advanced AI may soon be automatically discovered from the experiences of agents, rather than manually designed by humans”. The research team reports a meta-learned reinforcement learning update rule that beats hand-designed algorithms, delivering state-of-the-art on Atari and strong transfer to new benchmarks.

Most RL systems still rely on human written recipes for how to update the policy and value estimates, which locks progress to manual tuning that often breaks across tasks. This work treats the update rule itself as learnable, so the agent learns how to change its own policy and predictions from experience rather than following a fixed formula.

A large population of agents collects experience across many different environments, and an outer learning process improves a shared rule to boost long-term returns across the whole population. The discovered rule consumes signals like rewards, transitions, and current value estimates, then emits parameter updates for both the policy and the value predictions.

On the Atari benchmark, the discovered rule surpassed all existing rules, and on challenging unseen benchmarks it outperformed several strong RL baselines. Learning the rule across many tasks encourages updates that are stable, sample aware, and robust, because brittle tricks do not survive aggregation over diverse experience.

The result suggests a shift where researchers curate training diversity and guardrails, and the system automatically discovers the low-level learning dynamics. The discovery relied on large-scale experiments, which supplied enough variation for the rule to generalize rather than overfit to a single domain.

🏆 Github Resource: Convert PDF documents to Markdown format using DeepSeek-OCR with FastAPI backend.

Screenshot of DeepSeek-OCR PDF to Markdown Converter interface in a web browser. Displays a file upload section with listed files including begin.yaml branch tags search to file 3 comments add file. Shows options to add files via upload. Includes sections for Docker README.md build cat yaml pdf markdown processor py start server py README. Features a powerful OCR solution description that converts PDF documents to Markdown format using DeepSeek-OCR with FastAPI. Mentions batch processing script and REST API for flexible document conversion. Provides start processing instructions: place PDF in docs directory run pdf to markdown processor py ensure DeepSeek-OCR API running docker setup below.

This new Github Repo gave DeepSeek’s newly released OCR model 10,000 pdfs to convert to markdown.

And it averaged less than 1 second per page. Hardware - 1 x A6000 ADA on a Ryzen 1700 with 32gb ram. Dockerized model with fastapi in a wsl environment.

Hardware Requirements

📡 Reddit sues Perplexity and 3 data brokers for allegedly scraping Reddit conversations without permission in a New York federal filing.

Reddit sues Perplexity for scraping data to train AI system.

The complaint says scrapers pulled Reddit content from Google results to resell it, and alleges Perplexity bought that data, while Reddit asks for damages and an injunction. Perplexity denies wrongdoing and says it provides factual answers while defending open information access.

The core point is whether harvesting through Google results and reseller feeds still counts as circumvention of Reddit’s protections and terms rather than fair public indexing, which is the line this case will decide. Reddit already licenses data to Google and OpenAI, including a Google deal reportedly worth $60M per year, which sets the baseline for paid access. Earlier in Jun-25, Reddit also sued Anthropic over related scraping claims which is still ongoing and no final judgment or settlement has been publicly announced yet.

Connect with me on X (Twitter)

💼 Meta is cutting 600 jobs in its AI org, hitting Fundamental AI Research (FAIR), AI infrastructure, and AI product teams, while sparing the elite TBD Lab that houses many recent hires.

An internal memo from Chief AI Officer Alexandr Wang says smaller teams will move faster and give each person more scope and impact. Affected staff are being urged to apply for roles inside Meta, with reports of a notice period into late Nov-25.

The restructuring centers power in Superintelligence Labs and keeps TBD Lab untouched while hiring continues for select AI roles. This comes alongside a $27B financing deal with Blue Owl to fund the Hyperion data center in Louisiana, signaling a heavier push on compute.

Inside Meta, teams had been competing for GPU access, and parts of the org were seen as bloated, so leadership is consolidating around smaller, talent-dense groups. In plain terms, broad research like FAIR loses headcount while small model-focused groups that own GPUs and ship product gain clout.

🇨🇳 Deep dive: China is flooding its factories with robots faster than anyone else.

China installed about 300K new Robots in 2024 and over 2mn already working day and nights, which is more than the rest of the world combined.

United States trailing far behind in third place. China installing nearly 10 times as many robots in factories as the US.

China pushed this through a long policy drive that pumped cheap loans, subsidies, and clear targets into robotics, similar to the pushes that scaled electric vehicles and AI. China now makes about 33% of the world’s industrial robots, up from about 25% in 2023, while Japan fell to 29%, which signals a shift in where core automation hardware is built.

On the factory floor, robots handle welding, pick and place, and assembly, and AI software watches machine data to flag wear, tune cycles, and cut downtime. The United States installed 34,000 and Japan 44,000 last year, and China runs 5x the robot stock of the United States, so the productivity gap widens in process-heavy sectors like autos and electronics.

Domestic supply is catching up fast, with about 60% of robots installed in China now also made in China, though top sensors, drives, and some chips still come mainly from Germany and Japan.

Humanoid robots are outside these counts, yet Chinese start-ups are accelerating, and entry models now cost about $6,000, even as many premium components remain imported.

Installer expertise is a constraint, but China has a large pool of electricians and controls programmers, and specialist pay has climbed to about $60,000 a year which pulls more talent in.

IMO, this lead rests on policy plus deep supply chains plus software-first operations, so expect lower unit costs and faster delivery cycles from Chinese factories compared with many peers.

A recent report by The Telegraph also said “Western executives who visit China are coming back terrified”

Executives report dark factories in China, i.e. lines run with minimal staff because robots handle welding, assembly, loading, and high speed pick and place, so the lights are literally not required for work.

Policy helped build this, since the Made in China program reimburses around 20% of robot spending under “jiqi huanren” which means replacing humans with machines, and local governments stack grants and procurement to cut risk for adopters.

Demand signals are real too, since BYD’s UK sales jumped from 5,260 to 35,604 in the year to September, and Chinese brands iterate models in about half the European timeline.

Demographics push adoption as well, because automation offsets a shrinking working age population while locking in export leverage across batteries, solar, wind, and drones.

The World Economic Forum’s report noted that China’s big edge comes from its concentrated manufacturing networks—like the Yangtze River Delta and Greater Bay Area—where suppliers, integrators, and customers all work nearby. This clustering shortens testing times, letting companies try out new components on real production lines in days or weeks. It fuels a self-reinforcing cycle: deploying automation gathers data, which improves systems, leading to even more scaling. The evolution of the smartphone supply chain built precision and miniaturization skills that now lift EVs, industrial robots, and factory tools—proof of how shared innovation reduces costs and enhances performance.

Building advanced robots needs an incredible range of ultra-precise parts. A standard industrial robot packs in motors, gears, torque sensors, optical encoders, circuit boards, connectors, cameras, IMUs, and custom actuators—all running in sync. Many of these components must be produced at micron-level accuracy while keeping costs low.

Take harmonic reducers, for instance—these compact gear systems used in robot joints were once dominated by Japanese and German brands like Sumitomo and Harmonic Drive. But Chinese manufacturers have caught up fast. Green Harmonic, a company from Suzhou, now makes drives with similar performance but 30–50% cheaper, holding over 30% of China’s market and expanding abroad.

That’s a wrap for today, see you all tomorrow.

Connect with me on X (Twitter)

Rohan's Bytes

Discussion about this post

Ready for more?