Initial xArm6 WorldBelief perception stack#2665
Conversation
Greptile SummaryThis PR adds the xArm6 WorldBelief perception stack.
Confidence Score: 4/5The restart and history restore path can publish incorrect identity state until the rehydration lifecycle is fixed.
dimos/perception/detection/world_belief_history.py, dimos/perception/detection/world_belief.py Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[RealSense RGB-D frames] --> B[YOLO-E detections]
B --> C[ObjectSceneRegistration]
A --> C
C --> D[3D observations and embeddings]
D --> E[WorldBelief]
E --> F[Present objects]
F --> G[Pick and place]
F --> H[Rerun and Detection3DArray]
E --> I[WorldBelief history DB]
I --> J[Rehydrate on restart]
J --> E
C --> K[Annotated image and pointcloud]
K --> H
A --> L[WorldBelief recorder]
C --> L
E --> L
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[RealSense RGB-D frames] --> B[YOLO-E detections]
B --> C[ObjectSceneRegistration]
A --> C
C --> D[3D observations and embeddings]
D --> E[WorldBelief]
E --> F[Present objects]
F --> G[Pick and place]
F --> H[Rerun and Detection3DArray]
E --> I[WorldBelief history DB]
I --> J[Rehydrate on restart]
J --> E
C --> K[Annotated image and pointcloud]
K --> H
A --> L[WorldBelief recorder]
C --> L
E --> L
Reviews (1): Last reviewed commit: "Initial xArm6 WorldBelief perception st..." | Re-trigger Greptile |
| visual_embedding_model=a["visual_embedding_model"], | ||
| visual_embedding_device=a["visual_embedding_device"], | ||
| visual_embedding_dim=a["visual_embedding_dim"], | ||
| ) |
There was a problem hiding this comment.
When the process restarts more than the configured reacquisition window after the last history write, restored entities get window=[] and keep their old last_seen timestamps. The first live frame advances WorldBelief to the current camera time, so restored IDs are not present and are rejected by the reacquisition recency check; the next observation creates a new identity instead of reusing the durable one.
| with self._lock: | ||
| now = self._frame_time(objects, frame_ts) | ||
| self._now = now | ||
| scene_established_at_frame_start = len(self._entities) >= max(3, self._min_support) |
There was a problem hiding this comment.
History Enables Candidate Gate
After rehydration, len(self._entities) can already satisfy this scene-established check even though none of those restored entities are present. A real object that does not match stale history is then forced through _update_new_candidate() and withheld until it has enough repeated support, so the first valid objects after a restart can be missing from the manipulation-facing present set.
WorldBelief live object identity for xArm6 manipulation
flowchart TB camera([RealSense RGB-D camera]) detector["YOLO-E prompt detector<br/><i>prompt mode, tracker disabled in xArm blueprint</i>"] osr["ObjectSceneRegistration<br/><i>RGB-D objects + CLIP/DINO crop embeddings</i>"] wb["WorldBelief<br/><i>stable IDs, support windows, re-acquisition</i>"] rerun["Rerun<br/><i>annotated image + 3D workspace</i>"] manip["PickAndPlaceModule<br/><i>reads present objects</i>"] recorder["XArm6WorldBeliefRecorder<br/><i>records replay-critical streams</i>"] subgraph m2 ["memory2"] direction TB subgraph session ["per-run recording DB"] direction LR color[(color_image)] depth[(depth_image)] info[(camera_info)] det2d[(detections_2d)] det3d[(detections_3d)] audit[(worldbelief_audit)] end subgraph history ["worldbelief_history.db"] direction LR object_events[(object evidence)] semantic_vec[(semantic vectors)] visual_vec[(visual vectors)] end end camera --> detector detector --> osr camera --> osr osr -->|"frame objects"| wb osr -->|"annotated image + current-frame pointcloud"| rerun wb -->|"present objects"| manip wb -->|"detections_3d + audit"| rerun wb ==> object_events wb ==> semantic_vec wb ==> visual_vec object_events -. "rehydrate compact identity state on restart" .-> wb camera ==> recorder osr ==> recorder wb ==> recorder recorder ==> color recorder ==> depth recorder ==> info recorder ==> det2d recorder ==> det3d recorder ==> audit classDef stream fill:#fef3c7,stroke:#d97706,stroke-width:2px classDef module fill:#dbeafe,stroke:#2563eb,stroke-width:2px classDef memory fill:#dcfce7,stroke:#16a34a,stroke-width:2px classDef external fill:#f3f4f6,stroke:#6b7280,stroke-width:1px class color,depth,info,det2d,det3d,audit,object_events,semantic_vec,visual_vec stream class detector,osr,wb,rerun,manip,recorder module class m2,session,history memory class camera externalWhat this unlocks
This PR adds a live object identity layer for the xArm6 perception/manipulation stack. The detector still sees frame-local masks; WorldBelief turns those observations into stable workspace objects that pick/place and visualization can trust.
WorldBeliefassociates current 3D observations against maintained identity state using geometry, labels, CLIP, DINO, support windows, and re-acquisition policy.present_objectsto theobjectsport andDetection3DArray.How it works - walkthrough
Objectobservations.presentonly after enough recent support.present_objectsare published to theobjectsport anddetections_3d. Pick/place sees stable workspace objects rather than raw detector churn.Runtime object model
The PR deliberately separates four related concepts:
objects,detections_3d, pick/place, 3D boxes.This split is why the Rerun pointcloud can stay visually clean while manipulation still receives stable present objects.
Why this design
present_objectsanddetections_3dremain the manipulation-facing outputs.Main files
dimos/robot/manipulators/xarm/blueprints/worldbelief.py,dimos/robot/all_blueprints.py,dimos/manipulation/blueprints.pydimos/perception/object_scene_registration.pydimos/perception/detection/world_belief.py,identity_association.py,identity_features.pydimos/perception/detection/world_belief_history.pydimos/perception/detection/type/detection3d/object.pydimos/models/embedding/dino.py,clip.py,mobileclip.py,dimos/perception/detection/detectors/yoloe.pydimos/robot/manipulators/xarm/worldbelief_recorder.py,dimos/memory2/module.pyKnown limits / follow-ups