feat(orchestrator): make merge-conflict check asynchronous via runway#245
Merged
Conversation
This was referenced Jun 15, 2026
b8f3e22 to
ddbb5ab
Compare
2d68e19 to
e6d2164
Compare
ddbb5ab to
edf7ab9
Compare
e6d2164 to
a51a86b
Compare
behinddwalls
commented
Jun 16, 2026
a51a86b to
d565b91
Compare
edf7ab9 to
3e8071d
Compare
d565b91 to
00ed26c
Compare
sbalabanov
requested changes
Jun 16, 2026
3e8071d to
cd798f6
Compare
00ed26c to
2ee27cf
Compare
cd798f6 to
3e8071d
Compare
a970815 to
bc59a27
Compare
3e8071d to
78d66a5
Compare
bc59a27 to
d9dc61f
Compare
d9dc61f to
7ab64d3
Compare
35e8531 to
cee7fb5
Compare
e0d4543 to
f8bc7f6
Compare
e4512bb to
7f52edf
Compare
7f52edf to
265fcfe
Compare
b337b8f to
9c029ed
Compare
265fcfe to
802e3a1
Compare
9c029ed to
be8fc07
Compare
802e3a1 to
4316027
Compare
mnoah1
approved these changes
Jun 18, 2026
behinddwalls
added a commit
that referenced
this pull request
Jun 18, 2026
## Summary ### Why? The message queue topic-binding proto option was named `topics`, which reads as a concrete wire topic name. It is not — it carries a stable **logical topic key**. Each implementer maps the key to whatever topic name its broker/queue requires (subject to that backend's naming constraints); on our Go side the keys are `consumer.TopicKey` values, mapped to concrete names through `TopicRegistry`. The name `topics` invited the wrong mental model. ### What? Rename the `google.protobuf.MessageOptions` extension `topics` → `topic_keys` (field number `50001` unchanged, so the wire/extension layout is identical) and reframe its doc comment to say it carries a logical key, not a wire name. Regenerated `messagequeue.pb.go` (`E_Topics` → `E_TopicKeys`). This is the base of a stack; the runway contract and the contract RFC that consume the option are updated in the branches stacked on top. ## Test Plan ✅ `make proto` — descriptor field renamed (`name=topics` → `name=topic_keys`), field number `50001` unchanged ✅ `./tool/bazel build //...` ## Issues ## Stack 1. @ #264 1. #259 1. #260 1. #245 1. #247
4316027 to
107f266
Compare
be8fc07 to
32c633a
Compare
behinddwalls
added a commit
that referenced
this pull request
Jun 18, 2026
## Summary
### Why?
Queue payloads are Go structs serialized with `encoding/json`, so the
wire shape is defined only by Go source. There is no language-neutral
contract a non-Go client can compile against, no explicit
topic-to-payload binding, and no distinction between a domain's private
wiring and a published cross-domain contract.
### What?
Doc-only — the design of record for message queue contracts
(`doc/rfc/messagequeue-contract.md`):
- **Contract language: Protobuf**, serialized as **protobuf JSON**
(`protojson`) so payloads stay self-describing JSON on the wire. The
`.proto` is the authority and the Go binding is generated from it, so it
cannot drift (no hand-authored struct, no drift test to keep them in
sync).
- **Topic binding:** a custom `topics` proto option (defined in
`api/base/messagequeue`) carries the wire topic names on the message
itself, read back by reflection — not on the publish/consume hot path,
which still resolves topics from a `consumer.TopicKey` via the registry.
- **Location by audience:** external contracts (something outside the
owning domain depends on them) live in `api/{domain}/messagequeue/`;
internal ones in `{domain}/core/messagequeue/`, with the split enforced
by Bazel `visibility`.
- Documents the accepted protojson conventions (snake_case field names,
UPPER_SNAKE enums, int64-as-string) and why JSON Schema, binary proto,
and Avro were rejected.
## Test Plan
- ✅ `make lint` (doc-only)
## Issues
## Stack
1. #264
1. @ #259
1. #260
1. #245
1. #247
107f266 to
b50d730
Compare
32c633a to
9789d66
Compare
b50d730 to
ac249c7
Compare
9789d66 to
f87ee0d
Compare
behinddwalls
added a commit
that referenced
this pull request
Jun 18, 2026
## Summary ### Why? Runway's merge queues are the first cross-domain message queue contract: a client (potentially non-Go) publishes a merge request and consumes the result without access to Runway's Go types or storage. Per the RFC (#259) this needs a language-neutral, proto-defined contract. ### What? Establishes the message queue contract pattern using Runway's merge queues as the reference: - Adds `api/runway/messagequeue/proto/merge.proto` defining `MergeRequest`/`MergeResult`, reusing the shared `Change` and `Strategy` types and the `topics` option from `api/base`; generated into `protopb`. - Payloads are serialized as **protobuf JSON** via thin `protojson` helpers (`MergeRequestToBytes`/`MergeRequestFromBytes` and the `MergeResult` counterparts); a `Topics()` reflection helper exposes the topic binding. Topic keys are co-located with the contract. - Moves the merge bindings out of `runway/entity` into `api/runway/messagequeue`. - A drift test round-trips the payloads and asserts every Runway topic key is bound to exactly one message via the `topics` option. ## Test Plan - ✅ `./tool/bazel test //api/runway/messagequeue:messagequeue_test` - ✅ `./tool/bazel build //...` ## Issues ## Stack 1. #264 1. #259 1. @ #260 1. #245 1. #247
The merge-conflict check ran synchronously inside the `validate` consumer by calling the `mergechecker` extension inline. A real merge attempt is slow and I/O-heavy, so doing it on the partition lease blocks the pipeline and couples SubmitQueue to the checker's latency. This moves the check to an asynchronous round-trip with runway, modelled on `build`/`buildsignal` but across a service boundary. The pipeline gains `validate ⇢ (runway) ⇢ mergeconflictsignal → batch`: - `validate` drops the inline `mergechecker` call and, after its existing dedup + change-metadata + claim work, publishes the full `MergeRequest` to the runway-owned `merge-conflict-check` queue, keyed by the request id as the client-owned correlation id. The id round-trips, so the result correlates straight back to the request (unlike `build`, whose server-generated id needs a mapping store). The hand-off to runway is retryable. - `mergeconflictsignal` (new) consumes runway's `MergeResult` off `merge-conflict-check-signal`, advances the request to `batch` when mergeable, or fails it when conflicted. - DLQ reconcilers drive the request to `Error` on dead-letter; the signal DLQ reads the request id straight off the result. The check is triggered directly from `validate`, not a standalone `mergeconflict` trigger stage: that hop only forwarded the request id before publishing to runway, so folding it into `validate` saves a queue round-trip and removes a stage (its topic key, consumer, and DLQ entry) with no change to the cross-boundary contract. Crossing the runway boundary is why these payloads carry full data rather than entity IDs; the queue-payload-boundary rule is documented in CLAUDE.md, with the pipeline diagram and stage table updated in workflow.md and the superseded `mergechecker` validate-path row noted in extension-contract.md. The `mergechecker` package is left in-tree (unused on the validate path); removing it is a follow-up. Runway's service implementation is out of scope -- only its contract (added in the parent PR) is consumed here. ✅ `bazel build //...` ✅ `bazel test //... --test_tag_filters=-integration,-e2e` (56 tests pass) ✅ `make gazelle` clean
f87ee0d to
370d4cd
Compare
behinddwalls
added a commit
that referenced
this pull request
Jun 18, 2026
## Summary ### Why? The committing merge ran synchronously inside the orchestrator, blocking the partition lease on a slow, I/O-heavy operation and coupling SubmitQueue to the merger's latency. This moves the merge to an asynchronous round-trip with Runway — mirroring the merge-conflict check (#245), but for the committing merge. ### What? Adds `batch → merge ⇢ (runway) ⇢ mergesignal → conclude/speculate`: - `merge` (new) consumes a batch ready to land, builds the full `MergeRequest` from the batch's member requests (one `MergeStep` per request in Contains order, carrying each request's change and land strategy), and publishes it to Runway's `merger` queue keyed by the batch id as the client-owned correlation id. - `mergesignal` (new) consumes Runway's `MergeResult` off `merger-signal`, correlates by the echoed id, and transitions the batch to `Succeeded` (merged) or `Failed` (could not) — then fans out to `conclude` (so member requests pick up the outcome) and `speculate` (so dependents can re-plan). Purely result-driven; no poll loop. - DLQ reconcilers drive the batch to a terminal state on dead-letter. Reuses the Runway merge contract from #260 — the same `MergeRequest`/`MergeResult`, carried on the committing `merger`/`merger-signal` topics rather than the dry-run check topics. Runway's service implementation is out of scope. ## Test Plan - ✅ `./tool/bazel build //...` - ✅ `./tool/bazel test //... --test_tag_filters=-integration,-e2e` (57 tests pass) ## Issues ## Stack 1. #264 1. #259 1. #260 1. #245 1. @ #247
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why?
The merge-conflict check ran synchronously inside the
validateconsumer by calling themergecheckerextension inline. A real merge attempt is slow and I/O-heavy, so doing it on the partition lease blocks the pipeline and couples SubmitQueue to the checker's latency. This moves the check to an asynchronous round-trip with runway, modelled onbuild/buildsignalbut across a service boundary.What?
The pipeline gains
validate ⇢ (runway) ⇢ mergeconflictsignal → batch:validatedrops the inlinemergecheckercall and, after its existing dedup + change-metadata + claim work, publishes the fullMergeRequestto the runway-ownedmerge-conflict-checkerqueue, keyed by the request id as the client-owned correlation id. The id round-trips, so the result correlates straight back to the request (unlikebuild, whose server-generated id needs a mapping store). The hand-off to runway is retryable.mergeconflictsignal(new) consumes runway'sMergeResultoffmerge-conflict-checker-signal, advances the request tobatchwhen mergeable, or fails it when conflicted.Erroron dead-letter; the signal DLQ reads the request id straight off the result.The check is triggered directly from
validate, not a standalonemergeconflicttrigger stage: that hop only forwarded the request id before publishing to runway, so folding it intovalidatesaves a queue round-trip and removes a stage (its topic key, consumer, and DLQ entry) with no change to the cross-boundary contract.Crossing the runway boundary is why these payloads carry full data rather than entity IDs; the queue-payload-boundary rule is documented in CLAUDE.md, with the pipeline diagram and stage table updated in workflow.md and the superseded
mergecheckervalidate-path row noted in extension-contract.md. Themergecheckerpackage is left in-tree (unused on the validate path); removing it is a follow-up. Runway's service implementation is out of scope — only its contract (added in #260) is consumed here.Test Plan
bazel build //...bazel test //... --test_tag_filters=-integration,-e2e(56 tests pass)make gazellecleanIssues
Stack