fix: stop dashboard restart loop and harden default API server key#26
Open
santteegt wants to merge 1 commit into
Open
fix: stop dashboard restart loop and harden default API server key#26santteegt wants to merge 1 commit into
santteegt wants to merge 1 commit into
Conversation
The dashboard s6 service was crash-looping forever: HERMES_DASHBOARD_HOST binds to 0.0.0.0 (required so hermes-agent.dappnode:8081 is reachable from the DAppNode network) and HERMES_DASHBOARD_INSECURE no longer bypasses the auth gate on non-loopback binds (upstream hardening, June 2026, closing an exploited hole where scanners found --insecure --host 0.0.0.0 dashboards and planted SSH-key-injecting MCP persistence). With no auth provider configured, hermes dashboard hard-exits on every start, and the s6 finish script restarts it unconditionally, flooding the logs. Configure the bundled username/password auth provider with static credentials (matching this package's existing convention for API_SERVER_KEY and the already-unauthenticated ttyd terminal - the real trust boundary here is the DAppNode network/VPN, not this credential), plus a pinned signing secret and a 30-day session so the one login persists across restarts. Separately, the API server (port 3000) was refusing to start for the same reason: upstream now rejects any API_SERVER_KEY under 16 chars as a guessable placeholder (that endpoint dispatches terminal-capable agent work). Replaced with a strong static key. Known limitation (see PR description): actually logging in still 500s due to an unrelated upstream regression in the dashboard's auto-SSO redirect, tracked upstream and not yet released - the crash loop is fixed, but full dashboard login needs that fix to ship first.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The package was stuck in an infinite restart loop on init: the
dashboards6 service kept crashing and being respawned, flooding the logs with aRefusing to bind dashboard to 0.0.0.0error.docker-compose.ymlbindsHERMES_DASHBOARD_HOSTto0.0.0.0(required sohermes-agent.dappnode:8081is reachable from the DAppNode network) and setsHERMES_DASHBOARD_INSECURE: "true".--insecure --host 0.0.0.0dashboards and used the exposed admin API to plant SSH-key-injecting MCP persistence):HERMES_DASHBOARD_INSECUREno longer bypasses the gate, and any non-loopback bind with no registeredDashboardAuthProvidernow hard-exits (SystemExit) instead of degrading gracefully.dashboardservice's s6finishscript treats "dashboard enabled" as "always restart on crash," thatSystemExitbecame the infinite loop.Separately, once the loop is fixed, the OpenAI-compatible API server (port 3000) fails to start for the same underlying reason:
API_SERVER_KEY: dappnodeis 8 characters, and upstream now refuses to start that endpoint with any key under 16 chars (it dispatches terminal-capable agent work, so a guessable key is RCE).Changes
docker-compose.ymlHERMES_DASHBOARD_INSECURE(accepted but ignored upstream, only produces a misleading warning).basic)DashboardAuthProviderwith static credentials (HERMES_DASHBOARD_BASIC_AUTH_USERNAME/_PASSWORD), matching this package's existing convention of static shared secrets scoped to the DAppNode network/VPN trust boundary (seeAPI_SERVER_KEY: dappnode,GATEWAY_ALLOW_ALL_USERS: true, and the already-unauthenticatedttydweb terminal on port 7681).HERMES_DASHBOARD_BASIC_AUTH_SECRET(session-signing key) and setHERMES_DASHBOARD_BASIC_AUTH_TTL_SECONDSto 30 days, so the one login persists across container restarts instead of invalidating every boot.API_SERVER_KEY: dappnodewith a strong 64-char hex key so the API server actually starts.getting-started.md— documents the default dashboard login and how to change it.Not included in this PR: a version bump to
dappnode_package.json, and a local build-time patch (patch-dashboard-auto-sso.py) that works around the upstream regression described below by rewriting the vendoredmiddleware.pyat image-build time. Both are being held back deliberately — see the note below.Configuring the
basicauth provider is necessary but not sufficient for a working dashboard login, due to an unrelated regression inhermes-agentitself:hermes_cli/dashboard_auth/middleware.py's_auto_sso_response()auto-redirects any unauthenticated page load straight to the OAuth-only/auth/loginroute whenever exactly one auth provider is registered — but it never checks whether that provider actually supports an OAuth redirect flow. With only thebasic(password-only) provider configured, this lands onBasicAuthProvider.start_login(), which unconditionally raisesNotImplementedError. Result: every unauthenticated page load 500s, before the user ever reaches the working/logincredential form.This fixes the crash loop reported here (the
dashboardservice now starts and stays up), but a real login attempt will still 500 until the upstream regression is fixed and released.start_login()on a password-only provider" (opened 2026-06-29, the same day the regressing auto-SSO feature was merged).supports_passwordguard in both_auto_sso_responseand the/auth/loginroute itself (defense in depth against stale links), with regression tests in both directions._auto_sso_responsealone, also with a regression test.Recommendation: hold this PR until one of the upstream fixes above lands in a released
hermes-agentversion, then bumpUPSTREAM_VERSION(anddappnode_package.json'sversion) in the same change so the dashboard is actually usable end-to-end, not just non-crashing. A local monkey-patch of the vendoredmiddleware.pyat Docker build time was considered and prototyped, but deliberately left out of this PR — patching vendored upstream source in our Dockerfile is fragile across upstream bumps and duplicates work already in flight upstream; better to wait for the real fix to ship.Test plan
npx @dappnode/dappnodesdk build --provider http://ipfs.dappnode:5001and confirmed thedashboards6 service starts and stays up (no more restart loop) and the API server starts without theAPI_SERVER_KEYrefusal.http://hermes-agent.dappnode:8081no longer loops (dashboard login itself will still 500 until the upstream fix above ships — expected per the note above).UPSTREAM_VERSIONpicks up the upstream fix, confirm logging in withdappnode/dappnodereaches the authenticated dashboard and the session survives a container restart.