The Improvement Loop: Measure → Tune → Re-Sim
Diagnose a failing device from sim telemetry and adjust the correct parameter(s) in the correct direction to bring it to spec — and prove the change was driven by the measured data, not by guessing.
Try this first — before any explanation.
The Bench opens with the latch-arm-v1 rig (a clean single-DOF tuning device — NOT the M4 slider-crank) already loaded, assembled, actuated, and already failing its spec. A red OUT OF SPEC chip sits top-right; a telemetry strip live-plots theta(t), theta_dot(t), and tau_act(t). The pinned spec card: S1 settle_time ≤ 0.80 s, S2 overshoot ≤ 5.0 %, S3 steady_error ≤ 1.0 deg, S4 hold_under_load (0.15 N·m disturbance at t=2.0s, |Δθ| ≤ 3°), S5 |tau_act| < 2.0 N·m for ≥95% of the run. First run: settle_time 1.42s (S1 FAIL), overshoot 23% (S2 FAIL), the rest PASS — VERDICT OUT OF SPEC. Editable params: arm_len, arm_thick, spring_k, actuator_kp, actuator_kv, stop_angle. Your goal, BEFORE watching anything: edit the panel, hit Re-Sim, and get all five checks to PASS. You are NOT told which knob to turn. Most learners' first instinct is to crank actuator_kp to 'get there faster' — the telemetry is engineered so that makes overshoot WORSE. That productive failure sets up the Model.
latch-arm-v1: a single-DOF revolute hinge driven by a torsion-spring + position actuator, modelled parametrically in numpy (in the full Bench this is CadQuery geometry + MuJoCo-WASM dynamics). Deterministic seed 0xD2C1, 3.0s rollout. Tune ONE knob — actuator_kv (damping) — and Re-Sim until all five spec checks PASS. Cranking actuator_kp makes overshoot worse; the data points at kv.
The Improvement Loop: Measure → Tune → Re-Sim
latch-arm-v1: a single-DOF revolute hinge driven by a torsion-spring + position actuator, modelled parametrically in numpy (in the full Bench this is CadQuery geometry + MuJoCo-WASM dynamics). Deterministic seed 0xD2C1, 3.0s rollout. Tune ONE knob — actuator_kv (damping) — and Re-Sim until all five spec checks PASS. Cranking actuator_kp makes overshoot worse; the data points at kv.
The idea, built visually.
Your device just failed. A worse engineer guesses; a better one reads. Telemetry isn't decoration — it's a confession. This curve overshoots the target by twenty-three percent, then takes more than a second to settle. Overshoot has a meaning: the push driving the arm is winning against whatever should be slowing it down. Too much go, not enough whoa. Cranking the gain to 'get there faster' — the obvious move — makes overshoot worse. The fix is the other dial: damping. More whoa, not more go. This is the loop: measure the gap, form ONE hypothesis about ONE parameter, change only that one, re-simulate. If you change three knobs at once and it passes, you've learned nothing — you can't repeat it. One change, driven by the data, re-sim. That's not luck — that's engineering you can defend. Now go close the loop on yours. (3:10 target, ElevenLabs calm-precise narrator; the video FRAMES the Bench attempt and never shows the full solution — the learner still finds which knob and how far.)
▣ Stage animation: Cold open on black: a single red OUT OF SPEC chip fades in centered, holds 1s. The chip slides up; three telemetry traces (theta, theta_dot, tau) draw left-to-right in teal #2BD9C4, the theta curve overshooting a shaded target band then ringing back in. Zoom the overshoot hump: a tall green PUSH arrow dwarfs a short orange STOP arrow. Split-screen two control-gain dials — kp labelled PUSH (stiffness), kv labelled WHOA (damping); turning kp up grows the hump, turning kv up shrinks it and drops settle time. A 4-box loop lights in sequence MEASURE → HYPOTHESIZE → CHANGE ONE → RE-SIM, the word ONE pulsing. Telemetry replays with kv raised: hump flattens under the band, settle marker slides left past the 0.80s line which flips orange→blue, IN SPEC chip fades in. Outro card: 'Read → Hypothesize → Change ONE → Re-Sim.' Deep-navy canvas, blue-led accents, IBM Plex Mono on every on-screen number, restrained kinetic type.
Build it up, step by step.
Step A (fully scaffolded worked example): the Bench shows a filled-in diagnosis table — overshoot 23% (S2 FAIL) MEANS push >> damping → candidate knob actuator_kv → INCREASE; settle_time 1.42s (S1) MEANS oscillating before resting → actuator_kv (same) → INCREASE; steady_error 0.4° (PASS) MEANS aim is fine → leave actuator_kp → HOLD. Takeaway: two failing checks point at the SAME knob — actuator_kv (damping) is too low; raise it; do NOT touch kp, because touching it would risk the one thing that already works. Step B (partial scaffold, predict-then-run): the panel locks all but one parameter to enforce 'change ONE.' The learner types a prediction ('raising kv should ____ overshoot and ____ settle_time' → 'decrease / decrease'), makes exactly one edit (raise actuator_kv), and Re-Sims. Expected first result at kv≈0.35: overshoot drops sharply and settle_time passes, but overshoot is still just over the limit — scaffold prompt: 'Your hypothesis held; you're close but S2 still fails by a little. Nudge the SAME knob a little further — don't jump to a new one.' Step C (scaffold removed): the full panel unlocks, no table, no prediction field — just the spec card, telemetry, and Re-Sim. The learner converges (e.g. actuator_kv ≈ 0.50) until all five pass. A change-log auto-records every edit plus the resulting metrics; this log is graded by Gate G.
How the Bench grades your run.
PASS WHEN All five spec checks PASS at seed 0xD2C1 (settle_time ≤ 0.80s, overshoot ≤ 5.0%, steady_error ≤ 1.0°, hold_under_load ≤ 3°, saturation < 5%), AND Gate G holds: the change-log's decisive edit was on a telemetry-implicated parameter moved in the implicated direction (raising actuator_kv). Passing the spec by brute-forcing every knob does NOT pass G.
- S2 overshoot 18% (limit 5%). You raised actuator_kp — that adds PUSH, which grows overshoot. The telemetry shows too little damping: raise actuator_kv instead.
- S2 overshoot 6.2% — close, limit is 5.0%. Same knob, same direction: nudge actuator_kv up ~0.05 more and re-sim.
- Spec met, but the change isn't defensible: you moved 4 parameters at once, so the fix can't be attributed to the data. Reset, change ONE implicated knob, and re-sim. A fix you can't explain isn't a fix.
- S5 saturation: tau hit 2.0 N·m for 11% of the run. You're commanding more torque than the actuator can deliver. Lower actuator_kp or raise actuator_kv to ask for less force.
- PASS — settle, overshoot, steady-error, hold, and saturation all in spec. One change, driven by the data, re-simulated. That's engineering you can defend.
Bring back what you've already mastered.
- (M2 — constraints/ranges) Before you push actuator_kp higher: the M2 guard says arm_thick ≥ 0.006. If raising kp led you to thin the arm to drop inertia, which guard would you violate? Type the failing constraint. Reconnects tuning to the valid design envelope.
- (M3 — DOF) How many degrees of freedom does this latch arm have, and about which axis does theta measure? Answer: 1 DOF, revolute about hinge z. Keeps the kinematic model alive under the physics.
- (M4 — metric definition) overshoot is computed relative to travel, not to stop_angle. If stop_angle were 70° instead of 35°, would the same absolute peak give a larger or smaller overshoot %? Answer: smaller (larger travel denominator). Reinforces that a metric is a definition, not a vibe.
What you must demonstrate to advance.
To unlock 5.2, demonstrate IN SIM: (1) bring latch-arm-v1 from OUT OF SPEC to all five checks PASS at seed 0xD2C1; (2) pass Gate G — the change-log shows the decisive edit was on a telemetry-implicated parameter in the implicated direction (evidence-driven, not brute-forced); and (3) pass a transfer probe — a second, unseen failing instance (seed 0xD2C1 with spring_k perturbed so the device now under-shoots and lands with steady_error ≈ 4.2°). Recognize the symptom changed (steady-state error, not overshoot) and pick the now-correct knob (actuator_kp↑ or spring_k↓), proving you're reading data, not pattern-matching last time's answer. Passing all three demonstrates the L2 competency: diagnose from telemetry and correct on evidence.
How this feeds your build.
This lesson IS the engine of the capstone. The diagnose→hypothesize→change-one→re-sim loop you just proved is exactly the optimization phase you run in 5.2 — except there, no one hands you the failing instance or the parameter panel: you define the spec, you build the model, and you drive it past target.