Infinite loops in reinforcement learning

What are infinite loops in RL?

In Reinforcement Learning, an agent follows a policy that maps states to actions. When a policy has flaws, the agent can repeat the same sequence of actions indefinitely without ever reaching its goal. This is a well-known failure mode with real implications for RL systems.

An infinite loop occurs when an agent’s policy causes it to revisit the same states repeatedly without making progress toward the goal.

How agents get stuck

Agents get trapped in infinite loops when two conditions are met:

Deterministic policy: The policy always returns the same action for a given state
No exploration: The agent never deviates from the policy’s prescribed actions

When these conditions exist together, a single flaw in the policy can cause the agent to loop forever.

The bug in this demo

The demo presents a 3×3 grid world where:

🤖 The agent starts at position (0,0)
🏆 The goal is at position (2,2)
🧱 A wall blocks position (1,1)

The agent’s policy contains a deliberate bug at position (1,2) — just one step above the goal:

function badPolicy(pos) {
  const key = `${pos[0]},${pos[1]}`;
  const policy = {
    '0,0': 1, '0,1': 1, '0,2': 2,
    '1,2': 3, // ← BUG: should be 2 (down)
    '1,0': 0,
  };
  return policy[key] ?? 1;
}

At position (1,2), the policy returns action 3 (left) instead of action 2 (down). This causes the agent to:

Move left from (1,2) to (1,1) — but that’s the wall!
Stay at (1,2) because the wall blocks movement
Try to move left again from (1,2)
Repeat forever

The agent is just one step away from the goal but can never reach it because of this single policy flaw.

Real-world implications

Infinite loops aren’t just a theoretical problem. They occur in real RL systems when:

Training policies have errors: Even well-trained policies can have edge cases
Environment changes: A policy trained on one environment might loop in a modified version
Sparse rewards: When rewards are rare, agents can get stuck exploring the same areas
Deterministic execution: Production systems often use deterministic policies for reproducibility

Connection to LLM agents

The same infinite loop problem applies to LLM-based agents that use tools:

An agent might repeatedly reformulate the same search query
An agent might retry a failed API call without changing parameters
An agent might loop through the same reasoning steps

Frameworks like LangChain and AutoGen implement iteration limits for exactly this reason.

See the LLM agents page to learn more about how this problem affects modern AI systems.

Visualizing the problem

The live demo shows this problem in action. Watch the left panel (“Sin Protección”) to see the agent get stuck:

The agent successfully navigates from (0,0) to (0,2)
Then moves down to (1,2) — one step from victory
Gets stuck trying to move left forever
The “Repeticiones” counter climbs as the agent repeats the same failed action

The visual feedback makes it obvious when the agent is stuck, but in a real system without visualization, this problem could go unnoticed for a long time.

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

Infinite loops in reinforcement learning

What are infinite loops in RL?

How agents get stuck

The bug in this demo

Real-world implications

Connection to LLM agents

Visualizing the problem

Next steps

Cycle detection

Prevention strategies

Build docs developers (and LLMs) love

Overview

Concepts

Demo Guide

Solutions

Implementation

Context

​What are infinite loops in RL?

​How agents get stuck

​The bug in this demo

​Real-world implications

​Connection to LLM agents

​Visualizing the problem

​Next steps

Cycle detection

Prevention strategies

Build docs developers (and LLMs) love

What are infinite loops in RL?

How agents get stuck

The bug in this demo

Real-world implications

Connection to LLM agents

Visualizing the problem

Next steps