Initial xArm6 WorldBelief perception stack by jhengyilin · Pull Request #2665 · dimensionalOS/dimos

jhengyilin · 2026-06-30T16:56:35Z

WorldBelief live object identity for xArm6 manipulation

flowchart TB
    camera([RealSense RGB-D camera])

    detector["YOLO-E prompt detector<br/><i>prompt mode, tracker disabled in xArm blueprint</i>"]
    osr["ObjectSceneRegistration<br/><i>RGB-D objects + CLIP/DINO crop embeddings</i>"]
    wb["WorldBelief<br/><i>stable IDs, support windows, re-acquisition</i>"]
    rerun["Rerun<br/><i>annotated image + 3D workspace</i>"]
    manip["PickAndPlaceModule<br/><i>reads present objects</i>"]
    recorder["XArm6WorldBeliefRecorder<br/><i>records replay-critical streams</i>"]

    subgraph m2 ["memory2"]
        direction TB
        subgraph session ["per-run recording DB"]
            direction LR
            color[(color_image)]
            depth[(depth_image)]
            info[(camera_info)]
            det2d[(detections_2d)]
            det3d[(detections_3d)]
            audit[(worldbelief_audit)]
        end
        subgraph history ["worldbelief_history.db"]
            direction LR
            object_events[(object evidence)]
            semantic_vec[(semantic vectors)]
            visual_vec[(visual vectors)]
        end
    end

    camera --> detector
    detector --> osr
    camera --> osr
    osr -->|"frame objects"| wb
    osr -->|"annotated image + current-frame pointcloud"| rerun
    wb -->|"present objects"| manip
    wb -->|"detections_3d + audit"| rerun
    wb ==> object_events
    wb ==> semantic_vec
    wb ==> visual_vec
    object_events -. "rehydrate compact identity state on restart" .-> wb
    camera ==> recorder
    osr ==> recorder
    wb ==> recorder
    recorder ==> color
    recorder ==> depth
    recorder ==> info
    recorder ==> det2d
    recorder ==> det3d
    recorder ==> audit

    classDef stream fill:#fef3c7,stroke:#d97706,stroke-width:2px
    classDef module fill:#dbeafe,stroke:#2563eb,stroke-width:2px
    classDef memory fill:#dcfce7,stroke:#16a34a,stroke-width:2px
    classDef external fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
    class color,depth,info,det2d,det3d,audit,object_events,semantic_vec,visual_vec stream
    class detector,osr,wb,rerun,manip,recorder module
    class m2,session,history memory
    class camera external

What this unlocks

This PR adds a live object identity layer for the xArm6 perception/manipulation stack. The detector still sees frame-local masks; WorldBelief turns those observations into stable workspace objects that pick/place and visualization can trust.

Capability	What changed	Why it matters
Stable live object IDs	`WorldBelief` associates current 3D observations against maintained identity state using geometry, labels, CLIP, DINO, support windows, and re-acquisition policy.	A can should keep its identity while it moves, disappears briefly, or is seen again after camera motion.
Manipulation-facing present set	OSR publishes WorldBelief `present_objects` to the `objects` port and `Detection3DArray`.	Pick/place consumes a filtered workspace state, not every noisy frame detection.
Cross-session identity seed	WorldBelief writes compact object evidence and vector evidence into a stable Memory2 history DB, then rehydrates maintained state on restart.	The next process starts with prior identity evidence instead of a blank identity table.
Cleaner Rerun view	Blueprint opens Rerun with annotated image on the left and 3D workspace on the right. Current-frame pointclouds are used for visual blobs.	The display reflects both live camera evidence and trusted WorldBelief state without stale pointcloud trails.
Replay/debug evidence	A dedicated recorder writes a fresh per-run Memory2 DB for color/depth/camera info/detections/audit streams.	We can inspect what the stack saw and what WorldBelief decided without mixing sessions.

How it works - walkthrough

t	Event	What happens
0	Blueprint starts	The xArm6 WorldBelief blueprint wires RealSense, YOLO-E, OSR, WorldBelief, Rerun, recorder, and pick/place. The stable history DB is opened if configured.
1	Camera sees objects	YOLO-E produces prompt-mode masks. OSR uses color, depth, camera info, and TF to build 3D `Object` observations.
2	Appearance evidence is attached	OSR crops each object and attaches CLIP semantic embeddings and DINO visual embeddings.
3	WorldBelief updates	The identity engine matches observations to existing IDs or creates candidates. Objects become `present` only after enough recent support.
4	Manipulation reads state	`present_objects` are published to the `objects` port and `detections_3d`. Pick/place sees stable workspace objects rather than raw detector churn.
5	Rerun displays state	The annotated image uses current-frame identity assignments, while the 3D view receives trusted boxes plus current-frame colored pointcloud visualization.
6	Object leaves view	Frustum and camera-motion handling keep a hidden identity available for re-acquisition instead of immediately deleting or publishing stale visual blobs.
7	Object returns	WorldBelief can reuse the prior ID when geometry and appearance evidence are strong enough. If evidence is ambiguous, creating a new ID is safer than forcing a wrong merge.
8	Process restarts	Compact WorldBelief evidence is rehydrated from Memory2 history so the identity table does not always start from scratch.

Runtime object model

The PR deliberately separates four related concepts:

Layer	Meaning	Used by
Raw detections	YOLO-E masks/classes from the current image.	OSR object construction.
Frame objects	Current-frame 3D objects after WorldBelief assigns IDs.	Annotated image and audit.
Present objects	Objects with enough recent support to be trusted as present.	`objects`, `detections_3d`, pick/place, 3D boxes.
Maintained objects	Remembered identities kept internally for hidden re-acquisition/history.	WorldBelief lifecycle and Memory2 history.

This split is why the Rerun pointcloud can stay visually clean while manipulation still receives stable present objects.

Why this design

WorldBelief is live task memory. Memory2 stores durable evidence, but the robot still needs a current materialized workspace state with deterministic identity policy.
Raw detector output is not enough for manipulation. Prompt-mode detection can flicker, split, or merge frame to frame. WorldBelief adds support windows, lifecycle state, and appearance-aware association before publishing objects to pick/place.
Memory2 remains the durable layer. WorldBelief writes object and vector evidence into Memory2 history and can rehydrate from it, but this PR does not claim to ship the final natural-language Memory2 query workflow.
Detector tracking is intentionally not the identity source. The xArm6 blueprint disables YOLO tracking so stable IDs come from the robot-side identity model instead of detector-local track state.
Visualization is separated from manipulation state. Current-frame pointcloud blobs are for visual clarity; present_objects and detections_3d remain the manipulation-facing outputs.

Main files

Area	Files
Product blueprint	`dimos/robot/manipulators/xarm/blueprints/worldbelief.py`, `dimos/robot/all_blueprints.py`, `dimos/manipulation/blueprints.py`
OSR integration	`dimos/perception/object_scene_registration.py`
Identity engine	`dimos/perception/detection/world_belief.py`, `identity_association.py`, `identity_features.py`
Durable history	`dimos/perception/detection/world_belief_history.py`
Object representation	`dimos/perception/detection/type/detection3d/object.py`
Embeddings/detector support	`dimos/models/embedding/dino.py`, `clip.py`, `mobileclip.py`, `dimos/perception/detection/detectors/yoloe.py`
Recording	`dimos/robot/manipulators/xarm/worldbelief_recorder.py`, `dimos/memory2/module.py`

Known limits / follow-ups

Exact same-location swaps between visually similar cans are still a hard identity case. If the physical objects exchange positions perfectly, stronger appearance mismatch policy may need a follow-up threshold split.
CLIP + DINO improve identity evidence but add latency. If annotated-image smoothness becomes the priority, the next step is a fast visual overlay before embedding completion while keeping trusted WorldBelief outputs after embeddings.
Runtime detector prompt updates currently replace the active prompt list. Append-to-default prompt workflow is a follow-up.
Memory2 vector evidence is recorded for future search/recall work, but this PR does not add the final agent query API.

greptile-apps · 2026-06-30T16:58:53Z

Greptile Summary

This PR adds the xArm6 WorldBelief perception stack.

New WorldBelief identity association and durable history storage.
OSR integration for embeddings, present objects, audit events, and annotated images.
xArm6 hardware blueprint wiring camera, detector, WorldBelief, Rerun, recorder, and pick/place.
DINO visual embeddings and updated stable Detection3D object output.
Dedicated Memory2 recorder for replay-critical perception streams.

Confidence Score: 4/5

The restart and history restore path can publish incorrect identity state until the rehydration lifecycle is fixed.

Restored objects keep old timestamps and an empty support window.
Normal restart gaps can prevent durable IDs from being reused.
Rehydrated-but-not-present objects can make new live objects wait behind the candidate gate.

dimos/perception/detection/world_belief_history.py, dimos/perception/detection/world_belief.py

Important Files Changed

Filename	Overview
dimos/perception/detection/world_belief.py	Adds support-gated object identity, association, candidate gating, re-acquisition, audit state, and history restore wiring.
dimos/perception/detection/world_belief_history.py	Adds Memory2-backed object and vector evidence history plus compact state rehydration.
dimos/perception/object_scene_registration.py	Connects OSR to WorldBelief, embeddings, frustum handling, present-set publication, audit events, and visualization outputs.
dimos/perception/detection/identity_association.py	Adds typed association evidence and one-to-one frame assignment policy.
dimos/perception/detection/type/detection3d/object.py	Extends object metadata and publishes stable object IDs with stored fitted 3D geometry.
dimos/robot/manipulators/xarm/blueprints/worldbelief.py	Adds the xArm6 WorldBelief hardware blueprint and runtime configuration.
dimos/robot/manipulators/xarm/worldbelief_recorder.py	Adds a timestamped recorder for camera, detection, pointcloud, and audit streams.
dimos/models/embedding/dino.py	Adds a DINOv2 image embedding wrapper for visual identity evidence.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[RealSense RGB-D frames] --> B[YOLO-E detections]
    B --> C[ObjectSceneRegistration]
    A --> C
    C --> D[3D observations and embeddings]
    D --> E[WorldBelief]
    E --> F[Present objects]
    F --> G[Pick and place]
    F --> H[Rerun and Detection3DArray]
    E --> I[WorldBelief history DB]
    I --> J[Rehydrate on restart]
    J --> E
    C --> K[Annotated image and pointcloud]
    K --> H
    A --> L[WorldBelief recorder]
    C --> L
    E --> L

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[RealSense RGB-D frames] --> B[YOLO-E detections]
    B --> C[ObjectSceneRegistration]
    A --> C
    C --> D[3D observations and embeddings]
    D --> E[WorldBelief]
    E --> F[Present objects]
    F --> G[Pick and place]
    F --> H[Rerun and Detection3DArray]
    E --> I[WorldBelief history DB]
    I --> J[Rehydrate on restart]
    J --> E
    C --> K[Annotated image and pointcloud]
    K --> H
    A --> L[WorldBelief recorder]
    C --> L
    E --> L

_{Reviews (1): Last reviewed commit: "Initial xArm6 WorldBelief perception st..." | Re-trigger Greptile}

greptile-apps · 2026-06-30T16:58:57Z

+                visual_embedding_model=a["visual_embedding_model"],
+                visual_embedding_device=a["visual_embedding_device"],
+                visual_embedding_dim=a["visual_embedding_dim"],
+            )


Rehydrated IDs Age Out

When the process restarts more than the configured reacquisition window after the last history write, restored entities get window=[] and keep their old last_seen timestamps. The first live frame advances WorldBelief to the current camera time, so restored IDs are not present and are rejected by the reacquisition recency check; the next observation creates a new identity instead of reusing the durable one.

greptile-apps · 2026-06-30T16:58:58Z

+        with self._lock:
+            now = self._frame_time(objects, frame_ts)
+            self._now = now
+            scene_established_at_frame_start = len(self._entities) >= max(3, self._min_support)


History Enables Candidate Gate

After rehydration, len(self._entities) can already satisfy this scene-established check even though none of those restored entities are present. A real object that does not match stale history is then forced through _update_new_candidate() and withheld until it has enough repeated support, so the first valid objects after a restart can be missing from the manipulation-facing present set.

Initial xArm6 WorldBelief perception stack

914c682

jhengyilin requested review from leshy, mustafab0, paul-nechifor and spomichter as code owners June 30, 2026 16:56

greptile-apps Bot reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial xArm6 WorldBelief perception stack#2665

Initial xArm6 WorldBelief perception stack#2665
jhengyilin wants to merge 1 commit into
dimensionalOS:mainfrom
jhengyilin:jhengyi/perception-worldbelief-clean-0625

jhengyilin commented Jun 30, 2026

Uh oh!

greptile-apps Bot commented Jun 30, 2026

Uh oh!

greptile-apps Bot Jun 30, 2026

Uh oh!

greptile-apps Bot Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jhengyilin commented Jun 30, 2026

WorldBelief live object identity for xArm6 manipulation

What this unlocks

How it works - walkthrough

Runtime object model

Why this design

Main files

Known limits / follow-ups

Uh oh!

greptile-apps Bot commented Jun 30, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant