L2 · PAI-110

Plan With a State Machine

Design a finite-state controller in Python control(obs) that switches modes (SEEK far away, ARRIVE up close, AVOID near a wall) to drive the real MuJoCo rover onto the goal pad and settle within tolerance.

Challenge

Try this first — before any explanation.

Your PID instinct from 2.2 says: one law, run it forever. So you write a single SEEK rule — full speed, steer toward goal_bearing — and let it rip. The rover charges across the arena and reaches the pad... then sails straight through it, swings back, and orbits the 0.12 m circle forever, full throttle the whole way. It never lands. One behavior can't both sprint there AND ease in. Those are two different jobs — and a wall in the way would be a third.

→

The Bench

Write control(obs) as a finite-state controller — SEEK, then ARRIVE, with an AVOID branch — and land the real MuJoCo rover on the green pad.

MUJOCO · REAL PHYSICS · IN-BROWSER

Plan With a State Machine

Write control(obs) as a finite-state controller — SEEK, then ARRIVE, with an AVOID branch — and land the real MuJoCo rover on the green pad.

Model

The idea, built visually.

Watch what your one rule does: it's perfect at 3 metres and disastrous at 10 centimetres. The exact command that gets you there — full speed, hard steer — is the command that flings you off the pad. No single law is right everywhere.

So stop asking one rule to do everything. Give each job its own room. In SEEK you only sprint and aim. In ARRIVE you only ease in — forward speed shrinks as goal_dist shrinks, so you coast to a stop on the pad instead of through it. And if a wall fills the rangefinder, AVOID takes over and pivots. Each mode is trivially simple because it has exactly one job. The intelligence isn't inside any room — it's in the doors: when goal_dist drops below half a metre, SEEK hands off to ARRIVE.

▣ Stage animation: One control law drawn as a single arrow that nails a far target then whips into a tight orbit around the green pad, ring counter climbing; the law splits into three labelled nodes SEEK / ARRIVE / AVOID, each holding one short line; directed arrows appear between them tagged with the trigger reads (goal_dist < 0.5 -> ARRIVE, range < 0.35 -> AVOID); ARRIVE's forward term visibly tapers as the rover glides onto the pad and the trajectory dot stops dead inside the circle.

Guided practice

Build it up, step by step.

Step A (worked): read d=obs['goal_dist'], b=obs['goal_bearing'], rng=obs['range'] at the top of control(obs) — these three reads pick the mode. Step B (fill-the-blank): write the SEEK branch (forward=0.6, turn=1.2b, return (forward-turn, forward+turn)) and the door to ARRIVE (when goal_dist < ~0.5). Step C (independent): write ARRIVE so forward tapers with distance (forward = kd) and it settles inside tol, and add the AVOID branch (pivot in place when 0 < range < ~0.35). Remember positive goal_bearing means the goal is to your LEFT.

Feedback

How the Bench grades your run.

PASS WHEN Rover drives onto the goal pad and comes within tol (0.12 m) inside the time budget, using a finite-state controller whose mode is chosen from goal_dist (and range), with an ARRIVE mode that eases forward to settle on the pad.

FAIL: closest approach is far from the pad — your steering sign is off. Positive goal_bearing means the goal is to your LEFT, so increase the right wheel and decrease the left.
FAIL: you reach the pad's neighbourhood but never settle inside tol — a single full-speed SEEK orbits the circle. Add an ARRIVE mode that scales forward with goal_dist (forward = k * goal_dist) so it coasts to a stop on the pad.
FAIL: stalled or drove into the wall — gate an AVOID mode on 0 < obs['range'] < ~0.35 that pivots in place (return (-turn, turn)) instead of driving forward.

Retrieve & space

Bring back what you've already mastered.

From 2.2: ARRIVE reuses the same steering term but scales forward speed down near the goal — is that the P, I, or D idea? (Proportional: command proportional to remaining distance.)
From 2.1: why read goal_dist and range once at the top of control(obs) and branch on them? (They are the SENSE/PERCEIVE values; the mode switch lives in DECIDE.)
From Module 1: map SEEK, ARRIVE, AVOID onto SENSE->PERCEIVE->DECIDE->ACT — each mode runs the whole loop; the state machine lives in DECIDE.

Mastery gate

What you must demonstrate to advance.

Module 2 tier gate: drive the real MuJoCo rover onto the goal pad within tol (0.12 m) inside the time budget, using a finite-state control(obs) that switches SEEK -> ARRIVE on goal_dist (with an AVOID branch on range) and eases forward to settle rather than orbit (L2 classical DECIDE competency).

Project

How this feeds your build.

This finite-state control(obs) is the DECIDE layer of the capstone autonomy stack: it calls the 2.2 PID per mode, hands SEEK/ARRIVE off to the learned navigation of Module 3 in 5.1, and gets re-implemented on a timer in Module 4 / profiled to the metal in Module 5.

← PreviousThe PID Loop Next →Learn From Data