Autoregressive Large Language Models are Computationally Universal

Your LLM chatbot isn't just chatting - it's secretly a full computer in disguise 💡

Nov 08, 2024

Your LLM chatbot isn't just chatting - it's secretly a full computer in disguise 💡

Hidden in plain text: every LLM is already a complete computer.

This paper proves LLMs can perform any computation a regular computer can do, using only their basic text generation abilities and just by completing text naturally

👉 Proposals in this paper

• Achieves Turing completeness without external mechanisms or weight modifications

• Demonstrates universal computation using only extended autoregressive decoding

Original Problem 🧩:

LLMs' computational capabilities relative to classical models of computation remain unclear. Previous work relied on external mechanisms for universal computation.

Solution in this Paper 🔧:

• Introduces extended autoregressive decoding for processing long inputs

• Proves LLMs can simulate Lag systems, a classical computation model

• Develops a universal Lag system with 2027 rules from a 15-state, 2-symbol Turing machine

• Creates a system prompt for gemini-1.5-pro-001 to apply these rules

Key Insights from this Paper 💡:

• LLMs with extended autoregressive decoding are Turing complete

• Generalized autoregressive decoding only simulates linear bounded automata

• LLMs offer a more natural interface for expressing computational intent

• Universal computation might be inherent to LLM architecture, not learned

🔍 The central claim is that autoregressive decoding of a transformer-based LLM can realize universal computation, without external intervention or modification of the model's weights. Specifically, the paper proves that gemini-1.5-pro-001 with extended autoregressive (greedy) decoding is a general purpose computer.

🧠 Generalized autoregressive decoding allows processing of arbitrarily long input strings by appending emitted tokens to the end of the sequence as the context window advances. Extended autoregressive decoding further allows the model to emit multiple tokens (specifically, a pair of tokens) for certain contexts.

🔢 A Lag system is a classical model of computation consisting of a finite set of rules that operate on a memory string. The paper shows that the behavior of an LLM under extended autoregressive decoding corresponds to a Lag system.

🖥️ The paper demonstrates this by:

Proving that a Lag system can simulate a universal Turing machine
Showing that an LLM can simulate a universal Lag system
Developing a prompt that drives a specific LLM (gemini-1.5-pro-001) to correctly apply each rule of the universal Lag system

The Paper basically says, regular LLMs are stealth computers - no extra parts needed

🤖 The implications are significant:

It establishes that current LLMs are already capable of universal computation
It suggests that LLMs offer a more natural interface for humans to express computational intent compared to classical computers
It implies that long computational sequences will be necessary for LLMs to solve arbitrary tasks
It raises the possibility that the universal computation capability of LLMs might be an inherent property of their architecture, rather than something learned during training

🧮 The paper uses U15, a universal Turing machine with 15 states and 2 tape symbols. This machine was chosen because it's one of the smallest known universal Turing machines, making the proof more compact and manageable.

🔬 The authors developed a system prompt that drives gemini-1.5-pro-001, under deterministic (greedy) decoding, to correctly apply each of the 2027 production rules of the universal Lag system derived from U15.

Rohan's Bytes

Discussion about this post