Imitation and Sim-to-Real
Clone a demonstrated trajectory with behavioral cloning to RMSE <= 0.12 m and correctly explain why a sim-perfect policy can fail on real hardware.
Try this first — before any explanation.
Same rover, course C3 (an S-curve). Teleop the rover through the S-curve once; the Bench records (state, action) pairs. Train a behavioral-cloning policy to imitate your drive and run it autonomously. The trap: cloning fits your demo beautifully yet the cloned path slowly drifts off your line and into the wall on the second curve. The moment the policy makes a tiny error, the rover is in a state you never demonstrated — so it can't recover, and the error compounds. That drift is distribution shift, the same mechanism that breaks sim-trained policies on real hardware.
Clone the demo, watch it drift, apply a remedy, and explain the sim-to-real gap.
Imitation and Sim-to-Real
Clone the demo, watch it drift, apply a remedy, and explain the sim-to-real gap.
The idea, built visually.
You drove the perfect line, the policy copied you — and on paper it's flawless. So why, when it drives itself, does it wander off? You only ever demonstrated the good line — a narrow tube. The first time the clone slips even slightly outside it, it's in a state you never showed it; it has no idea how to get back, so the error snowballs. That's distribution shift.
Now scale up: a policy trained in a clean sim has only ever seen sim states — perfect friction, no slip. The real world hands it states the sim never produced. Same mechanism: the sim is the demonstration, reality is off the line. The cures all do one thing — widen the tube: show recovery from mistakes (DAgger), or randomize the sim so 'normal' already includes reality's mess.
▣ Stage animation: A clean blue demo path traces the S-curve; an amber clone starts on it, makes one tiny error, steps outside a shaded 'demonstrated tube', and spirals into the wall with a counter 0.02->0.09->0.31 m; the tube relabels TRAINING vs a wider lumpier RUN-TIME cloud; a cut to a real rover slipping on a bumpy floor; two fixes (DAgger, domain randomization) widen the tube.
Build it up, step by step.
Step 1 (worked): record, clone, and overlay demo (blue) vs clone (amber) with live match_rmse. Step 2 (worked): plot distance-from-demo vs time and watch it compound. Step 3 (faded): pick ONE remedy — DAgger-lite recovery demos (RECOVERY=True) or domain randomization (RANDOMIZE=True) — and re-run under tolerance. Step 4 (independent): answer the structured sim-to-real explanation.
How the Bench grades your run.
PASS WHEN Trajectory-match RMSE <= 0.12 m and max drift <= 0.20 m with no collision, AND the concept check names the mechanism (distribution shift), its sim->real cause, and a valid mitigation with its effect.
- FAIL: match_rmse too high and collided — compounding drift; add recovery demos at the states where it drifts (curve 2), not where it already matches.
- FAIL: drift bound above 0.20 m — single demo too narrow; collect recovery data or train with randomization.
- FAIL concept: you described the symptom but not the mechanism — name why a small error grows (the policy enters states absent from its training data and can't recover).
Bring back what you've already mastered.
- From 3.1: behavioral cloning is the 3.1 classifier with a different label source — what replaced the expert's hand-labels? (Your own teleop demonstration.)
- From 3.2: BC needed no reward but only saw the demo's states; which suffers distribution shift more, and why? (BC — it never explores off-demo.)
- From 2.1: name two real-world state perturbations the sim never generated (wheel slip, friction drift, sensor latency).
What you must demonstrate to advance.
Module 3 exit gate: BC policy meets RMSE <= 0.12 m, completes C3 with no collision, bounds drift <= 0.20 m, and passes the three-concept sim-to-real check (L3: clone to tolerance and reason about the sim-to-real gap).
How this feeds your build.
Closes Module 3's capstone contribution: the classifier (mode selector), RL policy (learned navigation), and sim-to-real reasoning integrate into 5.1; learned navigation ships as doorway_policy(state) -> Command(heading=...).