Skip to content

fix: stop dashboard restart loop and harden default API server key#26

Open
santteegt wants to merge 1 commit into
dappnode:mainfrom
santteegt:fix/dashboard-crash-loop-auth-gate
Open

fix: stop dashboard restart loop and harden default API server key#26
santteegt wants to merge 1 commit into
dappnode:mainfrom
santteegt:fix/dashboard-crash-loop-auth-gate

Conversation

@santteegt

@santteegt santteegt commented Jul 3, 2026

Copy link
Copy Markdown

Summary

The package was stuck in an infinite restart loop on init: the dashboard s6 service kept crashing and being respawned, flooding the logs with a Refusing to bind dashboard to 0.0.0.0 error.

  • docker-compose.yml binds HERMES_DASHBOARD_HOST to 0.0.0.0 (required so hermes-agent.dappnode:8081 is reachable from the DAppNode network) and sets HERMES_DASHBOARD_INSECURE: "true".
  • Upstream hardened the dashboard auth gate in June 2026 (closing a real, exploited hole where attackers scanned for --insecure --host 0.0.0.0 dashboards and used the exposed admin API to plant SSH-key-injecting MCP persistence): HERMES_DASHBOARD_INSECURE no longer bypasses the gate, and any non-loopback bind with no registered DashboardAuthProvider now hard-exits (SystemExit) instead of degrading gracefully.
  • Because the dashboard service's s6 finish script treats "dashboard enabled" as "always restart on crash," that SystemExit became the infinite loop.

Separately, once the loop is fixed, the OpenAI-compatible API server (port 3000) fails to start for the same underlying reason: API_SERVER_KEY: dappnode is 8 characters, and upstream now refuses to start that endpoint with any key under 16 chars (it dispatches terminal-capable agent work, so a guessable key is RCE).

Changes

  • docker-compose.yml
    • Removed the now-inert HERMES_DASHBOARD_INSECURE (accepted but ignored upstream, only produces a misleading warning).
    • Configured the bundled username/password (basic) DashboardAuthProvider with static credentials (HERMES_DASHBOARD_BASIC_AUTH_USERNAME/_PASSWORD), matching this package's existing convention of static shared secrets scoped to the DAppNode network/VPN trust boundary (see API_SERVER_KEY: dappnode, GATEWAY_ALLOW_ALL_USERS: true, and the already-unauthenticated ttyd web terminal on port 7681).
    • Pinned HERMES_DASHBOARD_BASIC_AUTH_SECRET (session-signing key) and set HERMES_DASHBOARD_BASIC_AUTH_TTL_SECONDS to 30 days, so the one login persists across container restarts instead of invalidating every boot.
    • Replaced the too-short API_SERVER_KEY: dappnode with a strong 64-char hex key so the API server actually starts.
  • getting-started.md — documents the default dashboard login and how to change it.

Not included in this PR: a version bump to dappnode_package.json, and a local build-time patch (patch-dashboard-auto-sso.py) that works around the upstream regression described below by rewriting the vendored middleware.py at image-build time. Both are being held back deliberately — see the note below.

⚠️ External fix required before merging

Configuring the basic auth provider is necessary but not sufficient for a working dashboard login, due to an unrelated regression in hermes-agent itself:

hermes_cli/dashboard_auth/middleware.py's _auto_sso_response() auto-redirects any unauthenticated page load straight to the OAuth-only /auth/login route whenever exactly one auth provider is registered — but it never checks whether that provider actually supports an OAuth redirect flow. With only the basic (password-only) provider configured, this lands on BasicAuthProvider.start_login(), which unconditionally raises NotImplementedError. Result: every unauthenticated page load 500s, before the user ever reaches the working /login credential form.

This fixes the crash loop reported here (the dashboard service now starts and stays up), but a real login attempt will still 500 until the upstream regression is fixed and released.

Recommendation: hold this PR until one of the upstream fixes above lands in a released hermes-agent version, then bump UPSTREAM_VERSION (and dappnode_package.json's version) in the same change so the dashboard is actually usable end-to-end, not just non-crashing. A local monkey-patch of the vendored middleware.py at Docker build time was considered and prototyped, but deliberately left out of this PR — patching vendored upstream source in our Dockerfile is fragile across upstream bumps and duplicates work already in flight upstream; better to wait for the real fix to ship.

Test plan

  • Rebuilt the package locally via npx @dappnode/dappnodesdk build --provider http://ipfs.dappnode:5001 and confirmed the dashboard s6 service starts and stays up (no more restart loop) and the API server starts without the API_SERVER_KEY refusal.
  • Install on a DAppNode and confirm http://hermes-agent.dappnode:8081 no longer loops (dashboard login itself will still 500 until the upstream fix above ships — expected per the note above).
  • Once UPSTREAM_VERSION picks up the upstream fix, confirm logging in with dappnode/dappnode reaches the authenticated dashboard and the session survives a container restart.

The dashboard s6 service was crash-looping forever: HERMES_DASHBOARD_HOST
binds to 0.0.0.0 (required so hermes-agent.dappnode:8081 is reachable from
the DAppNode network) and HERMES_DASHBOARD_INSECURE no longer bypasses the
auth gate on non-loopback binds (upstream hardening, June 2026, closing an
exploited hole where scanners found --insecure --host 0.0.0.0 dashboards and
planted SSH-key-injecting MCP persistence). With no auth provider
configured, hermes dashboard hard-exits on every start, and the s6 finish
script restarts it unconditionally, flooding the logs.

Configure the bundled username/password auth provider with static
credentials (matching this package's existing convention for
API_SERVER_KEY and the already-unauthenticated ttyd terminal - the real
trust boundary here is the DAppNode network/VPN, not this credential), plus
a pinned signing secret and a 30-day session so the one login persists
across restarts.

Separately, the API server (port 3000) was refusing to start for the same
reason: upstream now rejects any API_SERVER_KEY under 16 chars as a
guessable placeholder (that endpoint dispatches terminal-capable agent
work). Replaced with a strong static key.

Known limitation (see PR description): actually logging in still 500s due
to an unrelated upstream regression in the dashboard's auto-SSO redirect,
tracked upstream and not yet released - the crash loop is fixed, but full
dashboard login needs that fix to ship first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant