Skip to content

feat(teleop): hosted Go2 one-session module + robot commands, multicam, battery, LiveKit#2562

Draft
ruthwikdasyam wants to merge 53 commits into
feat/webrtc-transportfrom
ruthwik/hostedteleop/2
Draft

feat(teleop): hosted Go2 one-session module + robot commands, multicam, battery, LiveKit#2562
ruthwikdasyam wants to merge 53 commits into
feat/webrtc-transportfrom
ruthwik/hostedteleop/2

Conversation

@ruthwikdasyam

@ruthwikdasyam ruthwikdasyam commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Problem

feat/webrtc-transport shipped hosted teleop as a pure transport swap, but the Go2 driver and the hosted state plane lived in separate modules → separate processes. Because the broker provider is a per-process singleton and GO2Connection is dedicated_worker=True, the state-plane module opened a second Cloudflare session the operator never saw — so robot→operator telemetry (battery, command-plane stats) silently never arrived. The transport path also had no robot-command channel (posture/actions), no multi-camera support, and only one SFU backend (Cloudflare).

Solution

Consolidate the hosted control plane into the Go2 driver as one dedicated_worker module (Go2HostedConnection(GO2Connection)) so everything shares one CF session, then build the operator-facing features on top:

  • One-session moduledimos/robot/unitree/go2/hosted_connection.py. Folds state-plane + camera mux into the driver's process; replaces the standalone TeleopStateBridge (removed).
  • Robot commands (allow-listed sport_cmd on state_reliable, acked back): posture (StandReady combo / Sit / StandDown / RecoveryStand), actions (Hello / Stretch / FrontPounce / FrontJump), and an E-stop (Damp) out-of-band of the twist stream. New sport_command(api_id) + bidirectional set_rage_mode(enable) on the connection.
  • Speed modesset_mode → Normal / High (browser scale) + Rage (firmware FSM toggle).
  • Multi-camera mux — composites Go2 + RealSense into one video track, operator-selectable (cam1 / cam2 / side-by-side via camera_select), graceful degrade to Go2-only. New teleop-hosted-go2-multicam blueprint.
  • Battery telemetryget_battery_soc() + _on_lowstate (caches bms_state.soc).
  • Command-plane telemetry (compute-and-forward) — robot measures latency/jitter/rate over the inbound twist wire and forwards snapshots + cmd_ack on state_reliable_back; restored inline pong answering (_maybe_answer_ping) so clock-sync RTT works on the transport path.
  • LiveKit transportLiveKitBrokerProvider + LiveKitTransport/LiveKitVideoTransport as a drop-in alternative SFU to Cloudflare; new teleop-hosted-go2-livekit blueprint.

Removed the teleop-state-bridge and hosted-teleop-recorder blueprints (the latter only set a path; compose the generic teleop-recorder instead).

Operator-side cockpit UI, STUN/TURN display, and the ICE-negotiation speedup (10s→<1s) are in the dimensional-teleop repo (branch ruthwik/UI) — separate PR.

How to Test

dimos run teleop-hosted-go2-transport -o transports.broker.api_key=dtk_live_...

Connect from the operator web client: drive (WASD → twist), fire posture/actions, watch live battery + command-plane telemetry in the HUD. For the two-camera path (needs a RealSense on the dimos host) use teleop-hosted-go2-multicam; for the LiveKit SFU use teleop-hosted-go2-livekit.

Contributor License Agreement

  • I have read and approved the CLA.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
1907 3 1904 155
View the top 3 failed test(s) by shortest run time
dimos.protocol.pubsub.impl.webrtc.test_transport::test_transport_overrides_coerce_string_values
Stack Traces | 0.001s run time
def test_transport_overrides_coerce_string_values() -> None:
        """CLI/env overrides arrive as raw strings; non-str fields must coerce, not pass through."""
        bp = Blueprint(blueprints=()).transports({("topic", FakeLCMMsg): MockTransport("topic")})
>       new_bp = _apply_transport_overrides(bp, {"mock": {"count": "5"}})
E       NameError: name '_apply_transport_overrides' is not defined

bp         = Blueprint(blueprints=(), disabled_modules_tuple=(), transport_map=mappingproxy({('topic', <class 'dimos.protocol.pubsu...lobal_config_overrides=mappingproxy({}), remapping_map=mappingproxy({}), requirement_checks=(), configurator_checks=())

.../impl/webrtc/test_transport.py:280: NameError
dimos.robot.test_all_blueprints_generation::test_all_blueprints_is_current
Stack Traces | 2.61s run time
def test_all_blueprints_is_current() -> None:
        root = DIMOS_PROJECT_ROOT / "dimos"
        all_blueprints, all_modules = _scan_for_blueprints(root)
    
        common = set(all_blueprints.keys()) & set(all_modules.keys())
        assert not common, (
            f"Names must be unique across blueprints and modules, "
            f"but these appear in both: {sorted(common)}"
        )
    
        generated_content = _generate_all_blueprints_content(all_blueprints, all_modules)
    
        file_path = root / "robot" / "all_blueprints.py"
    
        if "CI" in os.environ:
            if not file_path.exists():
                pytest.fail(f"all_blueprints.py does not exist at {file_path}")
    
            current_content = file_path.read_text()
            if current_content != generated_content:
                diff = difflib.unified_diff(
                    current_content.splitlines(keepends=True),
                    generated_content.splitlines(keepends=True),
                    fromfile="all_blueprints.py (current)",
                    tofile="all_blueprints.py (generated)",
                )
                diff_str = "".join(diff)
>               pytest.fail(
                    f"all_blueprints.py is out of date. Run "
                    f"`pytest dimos/robot/test_all_blueprints_generation.py` locally to update.\n\n"
                    f"Diff:\n{diff_str}"
                )
E               Failed: all_blueprints.py is out of date. Run `pytest dimos/robot/test_all_blueprints_generation.py` locally to update.
E               
E               Diff:
E               --- all_blueprints.py (current)
E               +++ all_blueprints.py (generated)
E               @@ -74,8 +74,8 @@
E                    "path-planner-eval": "dimos.navigation.nav_3d.evaluator.blueprints:path_planner_eval",
E                    "teleop-hosted-go2": "dimos.teleop.quest_hosted.blueprints:teleop_hosted_go2",
E                    "teleop-hosted-go2-livekit": "dimos.teleop.quest_hosted.blueprints:teleop_hosted_go2_livekit",
E               +    "teleop-hosted-go2-multicam": "dimos.teleop.quest_hosted.blueprints:teleop_hosted_go2_multicam",
E                    "teleop-hosted-go2-transport": "dimos.teleop.quest_hosted.blueprints:teleop_hosted_go2_transport",
E               -    "teleop-hosted-go2-multicam": "dimos.teleop.quest_hosted.blueprints:teleop_hosted_go2_multicam",
E                    "teleop-hosted-xarm7": "dimos.teleop.quest_hosted.blueprints:teleop_hosted_xarm7",
E                    "teleop-phone": "dimos.teleop.phone.blueprints:teleop_phone",
E                    "teleop-phone-go2": "dimos.teleop.phone.blueprints:teleop_phone_go2",
E               @@ -165,6 +165,7 @@
E                    "g1-whole-body-connection": "dimos.robot.unitree.g1.wholebody_connection.G1WholeBodyConnection",
E                    "go2-connection": "dimos.robot.unitree.go2.connection.GO2Connection",
E                    "go2-fleet-connection": "dimos.robot.unitree.go2.fleet_connection.Go2FleetConnection",
E               +    "go2-hosted-connection": "dimos.robot.unitree.go2.hosted_connection.Go2HostedConnection",
E                    "go2-memory": "dimos.robot.unitree.go2.blueprints.smart.unitree_go2.Go2Memory",
E                    "go2-teleop-module": "dimos.teleop.quest.quest_extensions.Go2TeleopModule",
E                    "google-maps-skill-container": "dimos.agents.skills.google_maps_skill_container.GoogleMapsSkillContainer",

all_blueprints = {'alfred-nav': 'dimos.robot.diy.alfred.blueprints.alfred_nav:alfred_nav', 'coordinator-basic': 'dimos.control.blueprin...sian_ik_mock', 'coordinator-cartesian-ik-piper': 'dimos.control.blueprints.teleop:coordinator_cartesian_ik_piper', ...}
all_modules = {'alfred-high-level': 'dimos.robot.diy.alfred.effector_high_level.AlfredHighLevel', 'arm-teleop-module': 'dimos.teleop..._navigation.BBoxNavigationModule', 'b1-connection-module': 'dimos.robot.unitree.b1.connection.B1ConnectionModule', ...}
common     = set()
current_content = '# Copyright 2025-2026 Dimensional Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the "License");\n# you m...ebsocket_vis_module.WebsocketVisModule",\n    "zed-camera": "dimos.hardware.sensors.camera.zed.camera.ZEDCamera",\n}\n'
diff       = <generator object unified_diff at 0xff3c53b3b100>
diff_str   = '--- all_blueprints.py (current)\n+++ all_blueprints.py (generated)\n@@ -74,8 +74,8 @@\n     "path-planner-eval": "dim...e",\n     "google-maps-skill-container": "dimos.agents.skills.google_maps_skill_container.GoogleMapsSkillContainer",\n'
file_path  = PosixPath('.../dimos/robot/all_blueprints.py')
generated_content = '# Copyright 2025-2026 Dimensional Inc.\n#\n# Licensed under the Apache License, Version 2.0 (the "License");\n# you m...ebsocket_vis_module.WebsocketVisModule",\n    "zed-camera": "dimos.hardware.sensors.camera.zed.camera.ZEDCamera",\n}\n'
root       = PosixPath('.../dimos/dimos/dimos')

dimos/robot/test_all_blueprints_generation.py:66: Failed
dimos.robot.test_all_blueprints::test_blueprint_is_valid[teleop-hosted-go2-multicam]
Stack Traces | 3.14s run time
blueprint_name = 'teleop-hosted-go2-multicam'

    @pytest.mark.parametrize("blueprint_name", UBUNTU_BLUEPRINTS)
    def test_blueprint_is_valid(blueprint_name: str) -> None:
        """Validate blueprints that should import on the ubuntu-latest runner."""
>       _check_blueprint(blueprint_name)

blueprint_name = 'teleop-hosted-go2-multicam'

dimos/robot/test_all_blueprints.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dimos/robot/test_all_blueprints.py:81: in _check_blueprint
    blueprint = get_blueprint_by_name(blueprint_name)
        blueprint_name = 'teleop-hosted-go2-multicam'
        message    = "Failed to pull LFS file .../dimos/data/.lfs/xarm7.tar.gz after 3 attempts: Command '['git', 'lfs', 'pull', '--include', 'data/.lfs/xarm7.tar.gz']' returned non-zero exit status 1."
dimos/robot/get_all_blueprints.py:47: in get_blueprint_by_name
    module = __import__(module_path, fromlist=[attr])
        attr       = 'teleop_hosted_go2_multicam'
        module_path = 'dimos.teleop.quest_hosted.blueprints'
        name       = 'teleop-hosted-go2-multicam'
.../teleop/quest_hosted/blueprints.py:17: in <module>
    from dimos.control.blueprints.teleop import coordinator_teleop_xarm7
        __builtins__ = <builtins>
        __cached__ = '.../quest_hosted/__pycache__/blueprints.cpython-312.pyc'
        __doc__    = 'Hosted teleop blueprints (WebRTC transport).'
        __file__   = '.../work/dimos/dimos/.../teleop/quest_hosted/blueprints.py'
        __loader__ = <_frozen_importlib_external.SourceFileLoader object at 0xff3c42faeba0>
        __name__   = 'dimos.teleop.quest_hosted.blueprints'
        __package__ = 'dimos.teleop.quest_hosted'
        __spec__   = ModuleSpec(name='dimos.teleop.quest_hosted.blueprints', loader=<_frozen_importlib_external.SourceFileLoader object at 0xff3c42faeba0>, origin='.../work/dimos/dimos/.../teleop/quest_hosted/blueprints.py')
.../control/blueprints/teleop.py:185: in <module>
    *_mujoco_if_sim(str(XARM7_SIM_PATH), _xarm7_teleop_cfg.dof),
        Blueprint  = <class 'dimos.core.coordination.blueprints.Blueprint'>
        Buttons    = <class 'dimos.teleop.quest.quest_types.Buttons'>
        ControlCoordinator = <class 'dimos.control.coordinator.ControlCoordinator'>
        JointState = <class 'dimos.msgs.sensor_msgs.JointState.JointState'>
        LCMTransport = <class 'dimos.core.transport.LCMTransport'>
        MujocoSimModule = <class 'dimos.simulation.engines.mujoco_sim_module.MujocoSimModule'>
        PIPER_FK_MODEL = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper_description.tar.gz after 3 attem....lfs/piper_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c43644750>
        PIPER_SIM_PATH = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper.tar.gz after 3 attempts: Command...ude', 'data/.lfs/piper.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cda50>
        PoseStamped = <class 'dimos.msgs.geometry_msgs.PoseStamped.PoseStamped'>
        XARM6_FK_MODEL = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm_description.tar.gz after 3 attemp.../.lfs/xarm_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cf850>
        XARM6_SIM_PATH = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm6.tar.gz after 3 attempts: Command...ude', 'data/.lfs/xarm6.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cc950>
        XARM7_FK_MODEL = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm_description.tar.gz after 3 attemp.../.lfs/xarm_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cd6d0>
        XARM7_SIM_PATH = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm7.tar.gz after 3 attempts: Command...ude', 'data/.lfs/xarm7.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cf3d0>
        __builtins__ = <builtins>
        __cached__ = '.../blueprints/__pycache__/teleop.cpython-312.pyc'
        __doc__    = 'Advanced control coordinator blueprints: servo, velocity, cartesian IK, and teleop IK.\n\nTeleop blueprints switch be...Cartesian IK (Piper, real-only)\n    dimos run coordinator-teleop-dual               # TeleopIK dual arm (real-only)\n'
        __file__   = '.../work/dimos/dimos/.../control/blueprints/teleop.py'
        __loader__ = <_frozen_importlib_external.SourceFileLoader object at 0xff3c42faef30>
        __name__   = 'dimos.control.blueprints.teleop'
        __package__ = 'dimos.control.blueprints'
        __spec__   = ModuleSpec(name='dimos.control.blueprints.teleop', loader=<_frozen_importlib_external.SourceFileLoader object at 0xff3c42faef30>, origin='.../work/dimos/dimos/.../control/blueprints/teleop.py')
        _catalog_piper = <function piper at 0xff3c435fc7c0>
        _catalog_xarm6 = <function xarm6 at 0xff3c435ffc40>
        _catalog_xarm7 = <function xarm7 at 0xff3c435fdc60>
        _is_sim    = ''
        _mock_6dof_cfg = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper_description.tar.gz after 3 attem.../piper_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] RobotConfig object at 0xff3c3ba90aa0>
        _mujoco_if_sim = <function _mujoco_if_sim at 0xff3c3ba328e0>
        _piper_cfg = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper_description.tar.gz after 3 attem.../piper_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] RobotConfig object at 0xff3c3ba64410>
        _piper_teleop_cfg = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper_description.tar.gz after 3 attem.../piper_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] RobotConfig object at 0xff3c3ba906e0>
        _xarm6_cfg = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm_description.tar.gz after 3 attemp...s/xarm_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] RobotConfig object at 0xff3c3ba64500>
        _xarm6_teleop_cfg = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm_description.tar.gz after 3 attemp...s/xarm_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] RobotConfig object at 0xff3c3ba90780>
        _xarm7_teleop_cfg = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm_description.tar.gz after 3 attemp...s/xarm_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] RobotConfig object at 0xff3c3ba90280>
        annotations = _Feature((3, 7, 0, 'beta', 1), None, 16777216)
        autoconnect = <function autoconnect at 0xff3d13811120>
        coordinator_cartesian_ik_mock = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper_description.tar.gz after 3 attem...fs/piper_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] Blueprint object at 0xff3c3bed4200>
        coordinator_cartesian_ik_piper = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/piper_description.tar.gz after 3 attem...fs/piper_description.tar.gz']' returned non-zero exit status 1.") raised in repr()] Blueprint object at 0xff3c3bed48f0>
        coordinator_combined_xarm6 = Blueprint(blueprints=(BlueprintAtom(kwargs={'hardware': [HardwareComponent(hardware_id='arm', hardware_type=<HardwareT...lobal_config_overrides=mappingproxy({}), remapping_map=mappingproxy({}), requirement_checks=(), configurator_checks=())
        coordinator_servo_xarm6 = Blueprint(blueprints=(BlueprintAtom(kwargs={'hardware': [HardwareComponent(hardware_id='arm', hardware_type=<HardwareT...lobal_config_overrides=mappingproxy({}), remapping_map=mappingproxy({}), requirement_checks=(), configurator_checks=())
        coordinator_velocity_xarm6 = Blueprint(blueprints=(BlueprintAtom(kwargs={'hardware': [HardwareComponent(hardware_id='arm', hardware_type=<HardwareT...lobal_config_overrides=mappingproxy({}), remapping_map=mappingproxy({}), requirement_checks=(), configurator_checks=())
        global_config = GlobalConfig(robot_ip=None, robot_ips=None, unitree_aes_128_key=None, xarm7_ip=None, xarm6_ip=None, can_port=None, dev...e, obstacle_avoidance=True, detection_model='moondream', listen_host='127.0.0.1', dimsim_scene='apt', dimsim_port=8090)
        make_gripper_joints = <function make_gripper_joints at 0xff3c7feab060>
dimos/utils/data.py:369: in __str__
    return str(self._ensure_downloaded())
        self       = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm7.tar.gz after 3 attempts: Command...ude', 'data/.lfs/xarm7.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cf3d0>
dimos/utils/data.py:347: in _ensure_downloaded
    cache = get_data(filename)
        cache      = None
        filename   = 'xarm7/scene.xml'
        self       = <[RuntimeError("Failed to pull LFS file .../dimos/data/.lfs/xarm7.tar.gz after 3 attempts: Command...ude', 'data/.lfs/xarm7.tar.gz']' returned non-zero exit status 1.") raised in repr()] LfsPath object at 0xff3c432cf3d0>
dimos/utils/data.py:304: in get_data
    archive_path = _decompress_archive(_pull_lfs_archive(archive_name))
        archive_name = 'xarm7'
        data_dir   = PosixPath('.../dimos/dimos/data')
        file_path  = PosixPath('.../dimos/dimos/data/xarm7/scene.xml')
        name       = 'xarm7/scene.xml'
        nested_path = PosixPath('scene.xml')
        path_parts = ('xarm7', 'scene.xml')
dimos/utils/data.py:248: in _pull_lfs_archive
    _lfs_pull(file_path, repo_root)
        file_path  = PosixPath('.../dimos/data/.lfs/xarm7.tar.gz')
        filename   = 'xarm7'
        repo_root  = PosixPath('.../work/dimos/dimos')
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

file_path = PosixPath('.../dimos/data/.lfs/xarm7.tar.gz')
repo_root = PosixPath('.../work/dimos/dimos')

    def _lfs_pull(file_path: Path, repo_root: Path, *, retries: int = 2) -> None:
        relative_path = file_path.relative_to(repo_root)
    
        env = os.environ.copy()
        env["GIT_LFS_FORCE_PROGRESS"] = "1"
    
        last_err: subprocess.CalledProcessError | None = None
        for attempt in range(1, retries + 2):  # retries + 1 total attempts
            try:
                subprocess.run(
                    ["git", "lfs", "pull", "--include", str(relative_path)],
                    cwd=repo_root,
                    check=True,
                    env=env,
                )
                return
            except subprocess.CalledProcessError as e:
                last_err = e
                if attempt <= retries:
                    time.sleep(attempt)  # 1s, 2s backoff
    
>       raise RuntimeError(
            f"Failed to pull LFS file {file_path} after {retries + 1} attempts: {last_err}"
        )
E       RuntimeError: Failed to pull LFS file .../dimos/data/.lfs/xarm7.tar.gz after 3 attempts: Command '['git', 'lfs', 'pull', '--include', 'data/.lfs/xarm7.tar.gz']' returned non-zero exit status 1.

attempt    = 3
env        = {'ACCEPT_EULA': 'Y', 'ACTIONS_ID_TOKEN_REQUEST_TOKEN': 'eyJhbGciOiJSUzI1NiIsImtpZCI6IjM4ODI2YjE3LTZhMzAtNWY5Yi1iMTY5LT...-version=2.0', 'ACTIONS_ORCHESTRATION_ID': '7743b813-42b8-41dd-b4da-87b11e424e14.tests.ubuntu-24_04-arm_3_14_fal', ...}
file_path  = PosixPath('.../dimos/data/.lfs/xarm7.tar.gz')
last_err   = CalledProcessError(1, ['git', 'lfs', 'pull', '--include', 'data/.lfs/xarm7.tar.gz'])
relative_path = PosixPath('data/.lfs/xarm7.tar.gz')
repo_root  = PosixPath('.../work/dimos/dimos')
retries    = 2

dimos/utils/data.py:216: RuntimeError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@ruthwikdasyam ruthwikdasyam changed the title Ruthwik/hostedteleop/2 feat(teleop): hosted Go2 one-session module + robot commands, multicam, battery, LiveKit Jun 23, 2026
If POST /sessions returned non-201 or wait_connected raised, _connect
left self._pc and self._http set — the next start() overwrote them
without cleanup, orphaning the PC and httpx connection pool. The
broker's new partial-failure 502s make this hit more often.

Wrap the body in try/except; on any failure call _disconnect (defensive
about None refs) before re-raising. Also cap the error body log at 200
chars — SDP carries short-lived ICE ufrag/pwd.
If the broker force-deletes the session (delete_session full teardown,
key revocation) the heartbeat used to log a warning every tick at
heartbeat_hz forever. Track a consecutive 401/404 streak and break out
of the loop after 5. _heartbeat_once now returns the HTTP status (None
on skip) so the loop can see it.
The connectionstatechange handler only logged at INFO and never reacted
to failed/closed. Combined with the heartbeat loop that used to spin
through 404s indefinitely, a robot with a dead PC kept publishing
frames into the void with no visible signal. Log terminal states at
ERROR so the orchestrator (and operators tailing logs) see them.
publish() rechecks readyState under the lock then schedules ch.send
via call_soon_threadsafe. Between the check and the loop tick the
channel can transition to closing (operator pagehide → broker reissues
SCTP ids → heartbeat closes), so ch.send raises InvalidStateError and
bubbles to the default asyncio exception handler.

Wrap the send in a closure that rechecks readyState on the loop thread
and swallows the exception at debug level.
state_reliable's message handler calls _maybe_answer_ping, which looks
up state_reliable_back in _dcs. With the previous insert order, a
clock-sync ping that arrived between createDataChannel(state_reliable)
and the _dcs[state_reliable] = ch insert would find no
state_reliable_back yet and drop the pong with a warning.

Reorder the heartbeat's ids dict so state_reliable_back is opened
first. Python 3.7+ preserves insertion order on dict iteration, so the
for-loop processes it before state_reliable.
set_rage_mode applied a 2s settle only when enable=True. On disable,
SwitchJoystick(False) fired immediately after RAGEMODE(False) — the
FSM was still mid-transition out of Rage, the joystick toggle got
silently no-op'd, and the robot stayed in Rage. The user's only
recovery was disconnect + relaunch.

Apply the settle in both directions.
_rage_active was initialised to False in __init__ and only ever flipped
by the operator's set_mode click. If the firmware was left in Rage by
a previous session, the short-circuit at "want_rage == self._rage_active"
returned "already in right FSM" on every Normal/High click and the user
was locked in Rage until disconnect + relaunch.

Force set_rage_mode(False) at start. The base GO2Connection only forces
RAGE *on* (when config.mode says so); we mirror that with an explicit
off so the tracked state matches the firmware.
ruthwikdasyam and others added 25 commits June 30, 2026 20:28
cmd_unreliable is lossy + unordered SCTP — a late packet can clobber a
fresher one. The reported symptom: press Q (left strafe), release,
press W (forward) → if Q's twist arrives at the robot after W's, the
robot keeps strafing left. Safety-relevant.

Override Go2HostedConnection.move() to:
- drop if operator-stamped ts is >0.5s old (existing staleness intent)
- drop if ts <= the highest ts seen on this stream (ordering)

Then delegate to super().move(). ts comes from TwistStamped's
Timestamped mixin (LCM header.stamp). _last_cmd_ts is initialised to
0.0 in __init__; first valid cmd always passes.
set_rage_mode(False) ended with SwitchJoystick(False), which puts the
robot in a static standup posture — the CoM shifts but the FSM won't
accept velocity commands (observed: exit Rage → try WASD → nothing).

Follow SwitchJoystick(False) with BalanceStand + RecoveryStand on the
disable path. BalanceStand alone isn't enough on real hardware;
RecoveryStand reliably lands the FSM in the state that consumes
WIRELESS_CONTROLLER twists.
Stand/Drive button did standup → sleep → BalanceStand and acked ok,
but on real hardware after transitions from Sit / Rage / StandDown the
FSM wasn't accepting velocity — WASD did nothing. Only a separate
Recovery press worked.

Append sport_command(RecoveryStand) after BalanceStand so the
combined Stand/Drive posture lands the robot in the accepts-velocity
state directly.
…c standup"

Unsure whether SwitchJoystick(False) actually leaves the FSM in the
wrong state or the added BalanceStand + RecoveryStand nudges it into
a different unintended state. Roll back for now; keep the symmetric
2s settle from the earlier commit.

This reverts commit 6d5f722.
_dispatch fanned out to subscribers but never answered pings, so on
LiveKit robots the operator's state.bestRttMs stayed Infinity,
clockOffsetMs stayed 0, and command timestamps went on the wire in
raw operator wall clock — the robot's move() staleness check then
compared time.time() to an operator-clock ts and either dropped
everything or accepted stale packets depending on host clock skew.

Port _maybe_answer_ping from BrokerProvider verbatim: on
topic=state_reliable, decode the ping and publish
{type:"pong", client_ts:<echo>, robot_ts: time.time()} on
state_reliable_back, reliable, before subscriber fanout.
The comment above the cmd_vel auto-stop timer claimed 0.5 seconds while
cmd_vel_timeout is 0.2. It's the safety-critical constant for hosted
teleop (halts the base when the operator link drops mid-drive), so the
comment must not lie about it. (HARDENING_PLAN B4 in dimensional-teleop.)
Every sport_cmd/set_mode/obstacle_avoidance spawned an unbounded daemon
thread, and _rage_active was read on the callback thread while a previous
toggle's thread wrote it (rapid toggles could double-fire the firmware).

- single-worker executor (repo pattern, cf. utils/threadpool and
  drake_world): commands run strictly in order — the rage check moves
  inside the serialized task, so the race is gone by construction
- bounded backlog: past 4 pending, commands are busy-rejected with
  ack ok=false instead of piling up threads
- Damp is the E-STOP: it bypasses the queue on a dedicated thread so a
  stop never waits behind a ~3.3s StandReady holding the worker

Tests: ordering, backlog rejection, and the Damp bypass; executor reaper
fixture keeps the repo thread-leak check green.
(HARDENING_PLAN B1+B3 in dimensional-teleop.)
Transport/UI duplicates of a nonce'd command (sport_cmd / set_mode /
obstacle_avoidance) within a 10s window re-ack the prior result instead
of re-executing; in-flight duplicates are dropped (the original acks).
Busy rejections and shutdown races unwind the reservation so genuine
retries still run.

Scope: JSON action commands only — cmd_vel twists carry no nonce and are
already guarded by the monotonic-timestamp drop in move(). TTL is short
on purpose: browser nonces restart at 1 per operator session.
(HARDENING_PLAN B2 in dimensional-teleop.)
Mirrors BrokerProvider's terminal condition: a revoked key or force-
deleted session otherwise retries + log-floods at heartbeat_hz forever.
Any other status (or network error) resets the streak.
(HARDENING_PLAN B5 in dimensional-teleop.)
Two safety gaps from the teleop audit: E-STOP was just an allow-listed
Damp (no latch — twists kept flowing after it), and nothing reacted when
the operator's command plane vanished (in-flight commands ran to
completion; only the 0.2s cmd_vel deadman covered the base).

- {"type":"estop"}: latch FIRST (move() refuses twists instantly,
  non-urgent commands rejected), then Damp on the urgent path — never
  queued behind a slow StandReady. {"type":"estop_clear"} re-arms
  without moving the robot.
- operator-lost: both providers inject a synthetic
  {"type":"operator_lost"} to state_reliable subscribers — CF detects
  the state channel id going away on heartbeat, LiveKit uses
  participant_disconnected. The module zeros the base (stop_movement),
  clears the per-session nonce cache, and optionally Damps
  (config.damp_on_operator_lost, off by default: a WiFi blip shouldn't
  drop the robot mid-patrol).

Tests: latch/refuse/re-arm, clear-does-not-move, operator-lost stop +
nonce reset, config-gated damp.
(HARDENING_PLAN A2 in dimensional-teleop.)
robot_telemetry gains a state dict — posture (tracked off successful
posture commands; gestures don't touch it), rage, obstacle avoidance,
camera selection, and the E-STOP latch — so a (re)connecting operator's
cockpit seeds from reality instead of optimistic defaults (it assumed
StandReady + OA on, and couldn't know rage/cams at all).

Telemetry now always publishes (state is always meaningful, not just
when cmd stats/battery exist); payload built by _telemetry_payload()
for testability. Roadmap Phase 2 item.
(HARDENING_PLAN A3 in dimensional-teleop.)
…dec pref, LiveKit encoding

The video pipeline ran wide open (aiortc defaults, source rate/resolution);
congestion surfaced as encoder drops and freezes instead of a bounded
stream. All knobs are opt-in (0/empty = today's behavior):

- Go2HostedConnectionConfig.video_max_fps / video_max_width: cap at the
  mux — fps gate before any composite work, downscale before the latency
  stamp so its 16px cells stay decodable. Holds on both transports.
- BrokerConfig.video_codec: reorder aiortc codec preferences (e.g. h264
  first) on the video transceiver; unknown codec falls back to defaults.
- LiveKitBrokerConfig.video_max_bitrate_bps / video_max_fps →
  TrackPublishOptions.video_encoding on the published track.

Tests: fps cap, width downscale (aspect preserved), zero-config passthrough.
(HARDENING_PLAN E1 in dimensional-teleop.)
- set_light(on) on the Go2 connection protocol: VUI brightness api
  (api_id 1005, level 10/0 — same service color() already drives);
  no-op stubs on replay/mujoco/dimsim
- hosted: {"type":"light", "enabled", "nonce"} handled like the
  obstacle-avoidance toggle (serialized worker, acked); _light_on
  tracked and pushed in robot_telemetry.state so the cockpit reconciles

Tests: toggle updates state + acks; failed RPC leaves state unchanged.
set_light now takes the firmware-native VUI level (0-10, clamped); the
hosted handler accepts {"type":"light", "brightness": 0..1} (validated,
NaN-rejected, clamped) and maps to levels. The already-deployed toggle's
{"enabled": bool} still works — mapped to full/zero brightness — so the
live frontend keeps functioning until the slider UI ships.
robot_telemetry.state.light is now the 0..1 float.

Tests: level mapping, legacy bool compat, clamp + malformed rejection,
failed-RPC state retention.
…t disabled

Root cause: set_rage_mode ended with SwitchJoystick(enable), so turning
rage OFF disabled firmware joystick listening — and the hosted module
force-resets rage OFF at every start(). move()'s WIRELESS_CONTROLLER
stick emulation was then silently ignored: keys pressed, robot deaf.

- switch_joystick(enable) extracted as a proper connection method
  (protocol + sim stubs); set_rage_mode now always re-enables it —
  rage only changes the speed envelope, never joystick listening
- Stand/Drive combo reordered to END drive-ready:
  standup → RecoveryStand (recover from Sit/Damp) → BalanceStand →
  SwitchJoystick(True). The old order ended in RecoveryStand, which
  is not a stick-accepting FSM state.

Test pins the exact call order and the joystick re-enable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant