perf(semantic-search): index from manifest projection#64
Conversation
Greptile SummaryThis PR threads
Confidence Score: 5/5Safe to merge — the change is additive and backward-compatible, both scan branches are symmetric, and the test suite now covers all four cases (changed with revision, skipped with revision, changed without revision, and the negative undefined assertion). The field is optional so no persisted records break on read. The conditional spread pattern is identical in both branches, and the PR addresses the previously flagged gap by testing both the skipped and changed paths explicitly. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["scan()"] --> B["Load manifest projections"]
B --> C{"For each manifest:\nstoredRow?.fingerprint\n== indexFingerprint?"}
C -- "Yes (fingerprint match)" --> D["Build skipped IndexJob\n+ sourceRevision spread"]
C -- "No (new/changed)" --> E["Build pendingChunk IndexJob\n+ sourceRevision spread"]
D --> F["Increment skipped counter"]
E --> G["Increment changed counter"]
F --> H["putJobs()"]
G --> H
H --> I["Return scan result\n(processed, changed, skipped, …)"]
style D fill:#d4edda
style E fill:#d4edda
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["scan()"] --> B["Load manifest projections"]
B --> C{"For each manifest:\nstoredRow?.fingerprint\n== indexFingerprint?"}
C -- "Yes (fingerprint match)" --> D["Build skipped IndexJob\n+ sourceRevision spread"]
C -- "No (new/changed)" --> E["Build pendingChunk IndexJob\n+ sourceRevision spread"]
D --> F["Increment skipped counter"]
E --> G["Increment changed counter"]
F --> H["putJobs()"]
G --> H
H --> I["Return scan result\n(processed, changed, skipped, …)"]
style D fill:#d4edda
style E fill:#d4edda
Reviews (3): Last reviewed commit: "test(semantic-search): cover skipped man..." | Re-trigger Greptile |
## Summary
Hardens the tool manifest projection used by indexing so manifest reads
observe the same stale-catalog freshness behavior as `tools.list`, and
lets plugins persist explicit source revisions for produced tool
catalogs.
This keeps the existing separate `tool_schema_manifest` table and column
shape. The change is a greenfield semantic cleanup of the manifest
contract, not a table rewrite.
## Changes
- Run stale connection catalog sync before `executor.tools.manifest()`
reads manifest rows.
- Add optional `ResolveToolsResult.sourceRevision` so plugins can
describe the source revision that produced a catalog.
- Persist plugin-provided source revisions into manifest rows, falling
back to the existing generic integration/connection revision when
absent.
- Set OpenAPI source revisions from `specHash`.
- Set MCP source revisions from a hash of the discovered MCP tool
manifest.
- Add SDK regressions for stale manifest self-healing and
plugin-provided source revision persistence.
## Call Stack
```text
executor.tools.manifest(filter)
-> syncStaleConnectionTools()
-> compare integration.config_revised_at with connection.tools_synced_at
-> produceConnectionTools() for stale connections
-> read tool_schema_manifest rows
-> apply filters and policy visibility
-> return compact indexing projection
```
```text
plugin.resolveTools()
-> returns tools, definitions, optional sourceRevision
produceConnectionTools()
-> sourceRevision ?? generic integration/connection revision
-> write tool rows, definition rows, manifest rows, catalog revision row
```
## Tests
- `bun run --cwd packages/core/sdk test -- src/executor.test.ts`
- `bun run --cwd packages/core/sdk typecheck`
- `bun run --cwd packages/plugins/openapi test --
src/sdk/spec-blob.test.ts src/sdk/plugin.test.ts`
- `bun run --cwd packages/plugins/openapi typecheck`
- `bun run --cwd packages/plugins/mcp test -- src/sdk/plugin.test.ts`
- `bun run --cwd packages/plugins/mcp typecheck`
- `bun run --cwd packages/plugins/semantic-search test --
src/sdk/tool-search-index.test.ts`
- `bun run --cwd packages/plugins/semantic-search typecheck`
- `bunx oxfmt --check packages/core/sdk/src/plugin.ts
packages/core/sdk/src/executor.ts packages/core/sdk/src/executor.test.ts
packages/plugins/openapi/src/sdk/backing.ts
packages/plugins/mcp/src/sdk/plugin.ts`
- `bunx oxlint -c .oxlintrc.jsonc --deny-warnings
packages/core/sdk/src/plugin.ts packages/core/sdk/src/executor.ts
packages/core/sdk/src/executor.test.ts
packages/plugins/openapi/src/sdk/backing.ts
packages/plugins/mcp/src/sdk/plugin.ts`
- `git diff --check`
## Notes
No deploy was performed. This is one milestone PR rather than a stack
because the existing semantic-search indexing code already consumes
`tools.manifest()` and passed its targeted tests against this change.
<!-- stack:links:start -->
### [Stack](https://github.com/aryasaatvik/stack)
1. **#63** 👈 current
2. #64
3. #65
<!-- stack:links:end -->
Carry tool manifest source revisions into index scan jobs so indexing work records the catalog source version that produced each fingerprint. This keeps semantic-search tied to the manifest projection without adding schema reads to scan.
3beadf6 to
14f8258
Compare
## Summary
Hardens the per-run manifest snapshot layer so scans are pinned to the
manifest captured at index creation time. This protects the KV-only scan
path from accidentally falling back to a later live catalog read.
## Changes
- Add regression coverage for create-time manifest snapshot stability.
- Prove `scan` does not reread a changed live manifest after `create`.
- Assert the scanned job keeps the original snapshot fingerprint and
source revision.
## Call Stack
```text
ToolSearchIndex.create()
-> listToolManifests() once
-> partition manifests
-> write index-manifest/v1/{runId}/{partition} to executor.cache
ToolSearchIndex.scan()
-> read partition snapshot from executor.cache
-> never call executor.tools.manifest()
-> materialize jobs from the snapshot
```
## Tests
- `bun run --cwd packages/plugins/semantic-search test --
src/sdk/tool-search-index.test.ts`
- `bun run --cwd packages/plugins/semantic-search typecheck`
- `bunx oxfmt --check
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts`
- `bunx oxlint -c .oxlintrc.jsonc --deny-warnings
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts`
## Stack
Base: #64
<!-- stack:links:start -->
### [Stack](https://github.com/aryasaatvik/stack)
1. #63
2. #64
3. **#65** 👈 current
<!-- stack:links:end -->
Summary
Carries the tool manifest projection into semantic-search scan jobs by retaining each manifest source revision alongside the fingerprint. This makes index work traceable to the source catalog version that produced it without adding schema reads to scan.
Changes
sourceRevisionto semantic-search index jobs.ToolSchemaManifest.sourceRevisioninto both skipped and changed scan jobs.Call Stack
Tests
bun run --cwd packages/plugins/semantic-search test -- src/sdk/tool-search-index.test.tsbun run --cwd packages/plugins/semantic-search typecheckbunx oxfmt --check packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/tool-search-index.ts packages/plugins/semantic-search/src/sdk/tool-search-index.test.tsbunx oxlint -c .oxlintrc.jsonc --deny-warnings packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/tool-search-index.ts packages/plugins/semantic-search/src/sdk/tool-search-index.test.tsStack
Base: #63
Stack