Skip to content

Epic: Host identity, topology immutability, and pool lifecycle refactor #921

Description

@dkropachev

Summary

This epic tracks the driver refactor needed to stop treating endpoint/address as host identity, make host_id the canonical identity, and put topology and pool lifecycle changes behind explicit, testable paths.

Today Host.endpoint is both mutable topology/address state and part of Host equality/hash behavior. Metadata, control connection refresh, session pools, load-balancing policies, client routes, and async pool creation can observe different versions of host state. That makes endpoint changes unsafe and makes it possible for stale pools or connections to publish work after the host has moved or been replaced.

Target architecture

  • host_id is the only stable host identity.
  • endpoint is connectivity/address metadata, not identity.
  • Endpoint lookup remains available only as a secondary index for discovery/event translation, for example _host_id_by_endpoint.
  • A same-host_id, different-endpoint update is handled through one explicit endpoint replacement flow.
  • A same-endpoint, different-host_id observation is treated as node replacement/rebirth, not the same host.
  • Host.__eq__ and Host.__hash__ must stop depending on mutable endpoint state.
  • Host identity/topology fields should become immutable from normal driver code after the replacement/update flow exists.
  • Runtime health remains mutable: up/down state, conviction policy state, reconnection handler, and pool health can still change.
  • Session pools and connections must be fenced by topology generation/version so stale async work cannot publish into current state.

Tracked issues

Non-goals

  • Do not rewrite all reactor implementations in one PR.
  • Do not fold every pool-management cleanup from Fix pool management #382 into the first Host identity PR.
  • Do not change query routing semantics except where required to preserve correctness across topology changes.
  • Do not remove endpoint lookup entirely; keep it as an index, not as identity.

Acceptance criteria

  • Host identity is defined by host_id, not endpoint/address.
  • hash(host) remains stable across endpoint/topology updates.
  • Equal Host objects have equal hashes.
  • Control connection refresh no longer resolves existing hosts by endpoint before host_id when host_id is known.
  • Metadata keeps endpoint lookup as a secondary index and updates it through a controlled topology path.
  • Same-host_id endpoint movement updates metadata, sessions, pools, policies, and route state consistently.
  • Same-endpoint/different-host_id is handled as host replacement/rebirth.
  • Stale pool creation, connection replacement, and failure callbacks cannot overwrite or damage current host state.
  • Unit tests cover identity semantics, endpoint replacement, stale async publication, and metadata index consistency.
  • Integration coverage exists for live endpoint/IP movement where feasible.

Proposed Implementation Order

  1. Host equality/hash depend on mutable endpoint #867 and Replace endpoint-based host lookup with host-id-first topology updates #922 - Make host_id the canonical host identity and remove endpoint-first host lookup/update behavior from topology paths.
  2. Introduce explicit endpoint replacement flow for existing host_id #923 - Add the explicit same-host_id endpoint replacement flow and route control connection refresh through it.
  3. Add topology-generation fencing for pools and connections #925 - Add topology-generation fencing so pools, connections, and async callbacks can prove they still belong to current host state.
  4. ResponseFuture can return or borrow connections from a stale endpoint pool #857, HostConnection can signal failures from stale or replaced pools #858, HostConnection can publish replacement shard connections after endpoint changes #859, and Session pool creation can publish stale pools to replacement hosts #860 - Fix stale response, failure, shard-connection, and pool-publication bugs using the generation fencing from Add topology-generation fencing for pools and connections #925.
  5. Freeze Host identity and topology fields after endpoint replacement path is in place #924 - Freeze Host identity/topology fields after direct mutation callsites are gone and the replacement flow owns endpoint movement.
  6. Client routes: partial CLIENT_ROUTES_CHANGE updates can break connection-id stickiness for a host #813 - Rework client route storage/stickiness around stable host identity, preserving all known routes per host where needed.
  7. Add topology-change tests for host-id identity and endpoint replacement #926 - Land focused topology-change tests throughout the sequence and close this after host-id identity, endpoint replacement, stale async fencing, and route-state coverage are complete.
  8. Fix pool management #382 - Continue the broader pool-management architecture cleanup once the scoped host identity, endpoint replacement, and stale-publication correctness work has landed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Fields

    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions