Plan With a State Machine
Design a finite-state controller in Python control(obs) that switches modes (SEEK far away, ARRIVE up close, AVOID near a wall) to drive the real MuJoCo rover onto the goal pad and settle within tolerance.
Try this first — before any explanation.
Your PID instinct from 2.2 says: one law, run it forever. So you write a single SEEK rule — full speed, steer toward goal_bearing — and let it rip. The rover charges across the arena and reaches the pad... then sails straight through it, swings back, and orbits the 0.12 m circle forever, full throttle the whole way. It never lands. One behavior can't both sprint there AND ease in. Those are two different jobs — and a wall in the way would be a third.
Write control(obs) as a finite-state controller — SEEK, then ARRIVE, with an AVOID branch — and land the real MuJoCo rover on the green pad.
Plan With a State Machine
Write control(obs) as a finite-state controller — SEEK, then ARRIVE, with an AVOID branch — and land the real MuJoCo rover on the green pad.
The idea, built visually.
Watch what your one rule does: it's perfect at 3 metres and disastrous at 10 centimetres. The exact command that gets you there — full speed, hard steer — is the command that flings you off the pad. No single law is right everywhere.
So stop asking one rule to do everything. Give each job its own room. In SEEK you only sprint and aim. In ARRIVE you only ease in — forward speed shrinks as goal_dist shrinks, so you coast to a stop on the pad instead of through it. And if a wall fills the rangefinder, AVOID takes over and pivots. Each mode is trivially simple because it has exactly one job. The intelligence isn't inside any room — it's in the doors: when goal_dist drops below half a metre, SEEK hands off to ARRIVE.
▣ Stage animation: One control law drawn as a single arrow that nails a far target then whips into a tight orbit around the green pad, ring counter climbing; the law splits into three labelled nodes SEEK / ARRIVE / AVOID, each holding one short line; directed arrows appear between them tagged with the trigger reads (goal_dist < 0.5 -> ARRIVE, range < 0.35 -> AVOID); ARRIVE's forward term visibly tapers as the rover glides onto the pad and the trajectory dot stops dead inside the circle.
Build it up, step by step.
Step A (worked): read d=obs['goal_dist'], b=obs['goal_bearing'], rng=obs['range'] at the top of control(obs) — these three reads pick the mode. Step B (fill-the-blank): write the SEEK branch (forward=0.6, turn=1.2b, return (forward-turn, forward+turn)) and the door to ARRIVE (when goal_dist < ~0.5). Step C (independent): write ARRIVE so forward tapers with distance (forward = kd) and it settles inside tol, and add the AVOID branch (pivot in place when 0 < range < ~0.35). Remember positive goal_bearing means the goal is to your LEFT.
How the Bench grades your run.
PASS WHEN Rover drives onto the goal pad and comes within tol (0.12 m) inside the time budget, using a finite-state controller whose mode is chosen from goal_dist (and range), with an ARRIVE mode that eases forward to settle on the pad.
- FAIL: closest approach is far from the pad — your steering sign is off. Positive goal_bearing means the goal is to your LEFT, so increase the right wheel and decrease the left.
- FAIL: you reach the pad's neighbourhood but never settle inside tol — a single full-speed SEEK orbits the circle. Add an ARRIVE mode that scales forward with goal_dist (forward = k * goal_dist) so it coasts to a stop on the pad.
- FAIL: stalled or drove into the wall — gate an AVOID mode on 0 < obs['range'] < ~0.35 that pivots in place (return (-turn, turn)) instead of driving forward.
Bring back what you've already mastered.
- From 2.2: ARRIVE reuses the same steering term but scales forward speed down near the goal — is that the P, I, or D idea? (Proportional: command proportional to remaining distance.)
- From 2.1: why read goal_dist and range once at the top of control(obs) and branch on them? (They are the SENSE/PERCEIVE values; the mode switch lives in DECIDE.)
- From Module 1: map SEEK, ARRIVE, AVOID onto SENSE->PERCEIVE->DECIDE->ACT — each mode runs the whole loop; the state machine lives in DECIDE.
What you must demonstrate to advance.
Module 2 tier gate: drive the real MuJoCo rover onto the goal pad within tol (0.12 m) inside the time budget, using a finite-state control(obs) that switches SEEK -> ARRIVE on goal_dist (with an AVOID branch on range) and eases forward to settle rather than orbit (L2 classical DECIDE competency).
How this feeds your build.
This finite-state control(obs) is the DECIDE layer of the capstone autonomy stack: it calls the 2.2 PID per mode, hands SEEK/ARRIVE off to the learned navigation of Module 3 in 5.1, and gets re-implemented on a timer in Module 4 / profiled to the metal in Module 5.