Taxi Route Optimization
with Reinforcement Learning

A tabular Q-Learning agent that masters pickup & drop-off routing in the Gymnasium Taxi-v3 environment through iterative Bellman updates.

🏆

—

Avg Reward

(evaluation)

✅

—

Success Rate

(test episodes)

👣

—

Avg Steps

(to complete)

🔄

—

Episodes Trained

⏱

—

Train Time

🎲

—

Final Epsilon

(exploration)

How It Works

State Space

500 discrete states encoding the taxi position (5×5 grid), passenger location (4 spots + in-taxi), and destination (4 spots).

s = row×100 + col×20 + pass×4 + dest

Q-Table Update

Bellman equation iteratively refines state-action values balancing immediate and future rewards.

Q(s,a) ← Q(s,a) + α[r + γ·max Q(s',a') − Q(s,a)]

ε-Greedy Policy

Balances exploration (random actions) with exploitation (greedy Q-table lookup). Epsilon decays exponentially.

ε ← max(ε_min, ε × ε_decay)

Rewards

Sparse reward signal: +20 for successful dropoff, −10 for illegal pickup/dropoff, −1 per step (time pressure).

r ∈ {+20, −10, −1}

Learning Curves

📈 Reward per Episode

Window: 100

👣 Steps to Completion

📉 Epsilon Decay

🏆 Eval Reward Distribution

Evaluation Results

#	Total Reward	Steps	Success
Loading…

Q-Table State Explorer

Decode any of the 500 Taxi-v3 states and inspect the Q-values for each action.

State ID (0–499)

Row—

Col—

Passenger—

Destination—

Best Action—

Taxi Passenger Destination

Taxi Route Optimization
with Reinforcement Learning

How It Works

State Space

Q-Table Update

ε-Greedy Policy

Rewards

Hyperparameters

Learning Curves

📈 Reward per Episode

👣 Steps to Completion

📉 Epsilon Decay

🏆 Eval Reward Distribution

Evaluation Results

Q-Table State Explorer

Episode Replay

Taxi Route Optimization with Reinforcement Learning

How It Works

State Space

Q-Table Update

ε-Greedy Policy

Rewards

Hyperparameters

Learning Curves

📈 Reward per Episode

👣 Steps to Completion

📉 Epsilon Decay

🏆 Eval Reward Distribution

Evaluation Results

Q-Table State Explorer

Episode Replay

Taxi Route Optimization
with Reinforcement Learning