Taxi Route Optimization
with Reinforcement Learning

A tabular Q-Learning agent that masters pickup & drop-off routing in the Gymnasium Taxi-v3 environment through iterative Bellman updates.

🏆
Avg Reward
(evaluation)
Success Rate
(test episodes)
👣
Avg Steps
(to complete)
🔄
Episodes Trained
Train Time
🎲
Final Epsilon
(exploration)

How It Works

01

State Space

500 discrete states encoding the taxi position (5×5 grid), passenger location (4 spots + in-taxi), and destination (4 spots).

s = row×100 + col×20 + pass×4 + dest
02

Q-Table Update

Bellman equation iteratively refines state-action values balancing immediate and future rewards.

Q(s,a) ← Q(s,a) + α[r + γ·max Q(s',a') − Q(s,a)]
03

ε-Greedy Policy

Balances exploration (random actions) with exploitation (greedy Q-table lookup). Epsilon decays exponentially.

ε ← max(ε_min, ε × ε_decay)
04

Rewards

Sparse reward signal: +20 for successful dropoff, −10 for illegal pickup/dropoff, −1 per step (time pressure).

r ∈ {+20, −10, −1}

Hyperparameters

Learning Curves

📈 Reward per Episode

👣 Steps to Completion

📉 Epsilon Decay

🏆 Eval Reward Distribution

Evaluation Results

# Total Reward Steps Success
Loading…

Q-Table State Explorer

Decode any of the 500 Taxi-v3 states and inspect the Q-values for each action.

Row
Col
Passenger
Destination
Best Action
Taxi Passenger Destination

Episode Replay

Step through a test episode and watch the agent navigate the grid.

Step: —
Cumulative Reward: —
Action: —
Step Log