Perception & World Models — BMI Physical AI Research

Research · Group 01

Machines that see — and model what they see.

An embodied system is only as good as its picture of the world. This group builds the perception stack — depth, detection, segmentation, and multimodal fusion — and the predictive world models that let an autonomous machine act on what it perceives: in degraded real-world conditions, and on the device itself.

The work

How this group works.

The methods and commitments that define the lab's approach to the problem.

Multimodal sensing

Fuse what one sensor can't see

LiDAR, thermal, and RGB combined for perception that holds up in smoke, glare, and the dark — where any single modality fails.

On-device inference

Perception at the edge

Compressed, scheduled models running in real time inside the airframe's power and thermal budget — no cloud round-trip in the control loop.

World models

From pixels to prediction

Self-supervised models that don't just label a scene but predict how it evolves — the substrate planning and control build on.

Robustness

Outside the lab

Perception measured where it matters: motion blur, occlusion, novel objects, and the long tail that breaks benchmark-tuned models.

Current directions

Open problems we're pursuing.

The questions the lab is taking on now — each a gap between what works in a demo and what works in the world.

Monocular depth on the edge

How far can one camera and a small model go for distance estimation, and where must active sensing take over?

Open-vocabulary detection

Recognizing objects the system was never trained on — table stakes for autonomy in unstructured environments.

Cross-modal calibration

Keeping LiDAR, thermal, and RGB aligned in space and time on a moving platform, without hand-tuning.

Live lab

See it run — on your device.

The group's perception stack, live in your browser: monocular depth and object detection running on-device via WebGPU, on a sample scene or your own webcam. No cloud, no install.

DEPTH ANYTHING V2 · DETECTION · IN-BROWSERReal neural inference on your device — sample image or live webcam

Join this group

Lead this group, or join it.

Step into the principal-investigator role through the Physical AI Investigator Program — an independent appointment with PI authority — or join the group as a research intern.

Bridge to independence

Physical AI Investigator Program

An independent investigator appointment and PI authority to lead a research line in this group — built to position you for early-career funding.

Explore the program →All open roles →

Research in this group can spin out into a company — explore Launch →