perf(semantic-search): index from manifest projection by aryasaatvik · Pull Request #64 · aryasaatvik/executor

aryasaatvik · 2026-06-24T15:08:27Z

Summary

Carries the tool manifest projection into semantic-search scan jobs by retaining each manifest source revision alongside the fingerprint. This makes index work traceable to the source catalog version that produced it without adding schema reads to scan.

Changes

Add optional sourceRevision to semantic-search index jobs.
Copy ToolSchemaManifest.sourceRevision into both skipped and changed scan jobs.
Add regression coverage that manifest source revisions survive scan materialization.

Call Stack

ToolSearchIndex.create()
  -> executor.tools.manifest()
  -> write manifest snapshot

ToolSearchIndex.scan()
  -> read manifest projection
  -> compare manifest.indexFingerprint with stored fingerprint
  -> write IndexJob(sourceRevision, fingerprint, path, status)

Tests

bun run --cwd packages/plugins/semantic-search test -- src/sdk/tool-search-index.test.ts
bun run --cwd packages/plugins/semantic-search typecheck
bunx oxfmt --check packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/tool-search-index.ts packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts
bunx oxlint -c .oxlintrc.jsonc --deny-warnings packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/tool-search-index.ts packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts

Stack

Base: #63

Stack

greptile-apps · 2026-06-24T15:11:22Z

Greptile Summary

This PR threads sourceRevision from the ToolSchemaManifest projection into both branches of scan() — skipped and pendingChunk — so every emitted IndexJob can be traced back to the catalog version that produced it, without any additional schema reads at runtime.

collections.ts: sourceRevision: Schema.optional(Schema.String) is appended to IndexJob; the field is optional so existing persisted records deserialize cleanly without a migration.
tool-search-index.ts: Both job-construction branches use the same conditional spread (manifest.sourceRevision === undefined ? {} : { sourceRevision }) to omit the field when the manifest does not carry one.
tool-search-index.test.ts: Adds a single test covering both the changed and skipped paths with a manifest that carries sourceRevision: "spec-hash-v1", plus a negative assertion in an existing test confirming the field stays absent when the manifest omits it.

Confidence Score: 5/5

Safe to merge — the change is additive and backward-compatible, both scan branches are symmetric, and the test suite now covers all four cases (changed with revision, skipped with revision, changed without revision, and the negative undefined assertion).

The field is optional so no persisted records break on read. The conditional spread pattern is identical in both branches, and the PR addresses the previously flagged gap by testing both the skipped and changed paths explicitly.

No files require special attention.

Important Files Changed

Filename	Overview
packages/plugins/semantic-search/src/sdk/collections.ts	Adds optional `sourceRevision: Schema.optional(Schema.String)` field to the `IndexJob` struct; backward-compatible schema change with no migration required.
packages/plugins/semantic-search/src/sdk/tool-search-index.ts	Spreads `sourceRevision` from the manifest into both the skipped and pendingChunk job branches using a conditional spread; logic is symmetric and correct in both paths.
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts	Extends `manifestForTool` helper to accept an optional `sourceRevision`; adds a comprehensive regression test covering both the changed and skipped scan branches, plus a negative assertion for the no-revision case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["scan()"] --> B["Load manifest projections"]
    B --> C{"For each manifest:\nstoredRow?.fingerprint\n== indexFingerprint?"}
    C -- "Yes (fingerprint match)" --> D["Build skipped IndexJob\n+ sourceRevision spread"]
    C -- "No (new/changed)" --> E["Build pendingChunk IndexJob\n+ sourceRevision spread"]
    D --> F["Increment skipped counter"]
    E --> G["Increment changed counter"]
    F --> H["putJobs()"]
    G --> H
    H --> I["Return scan result\n(processed, changed, skipped, …)"]

    style D fill:#d4edda
    style E fill:#d4edda

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["scan()"] --> B["Load manifest projections"]
    B --> C{"For each manifest:\nstoredRow?.fingerprint\n== indexFingerprint?"}
    C -- "Yes (fingerprint match)" --> D["Build skipped IndexJob\n+ sourceRevision spread"]
    C -- "No (new/changed)" --> E["Build pendingChunk IndexJob\n+ sourceRevision spread"]
    D --> F["Increment skipped counter"]
    E --> G["Increment changed counter"]
    F --> H["putJobs()"]
    G --> H
    H --> I["Return scan result\n(processed, changed, skipped, …)"]

    style D fill:#d4edda
    style E fill:#d4edda

_{Reviews (3): Last reviewed commit: "test(semantic-search): cover skipped man..." | Re-trigger Greptile}

## Summary Hardens the tool manifest projection used by indexing so manifest reads observe the same stale-catalog freshness behavior as `tools.list`, and lets plugins persist explicit source revisions for produced tool catalogs. This keeps the existing separate `tool_schema_manifest` table and column shape. The change is a greenfield semantic cleanup of the manifest contract, not a table rewrite. ## Changes - Run stale connection catalog sync before `executor.tools.manifest()` reads manifest rows. - Add optional `ResolveToolsResult.sourceRevision` so plugins can describe the source revision that produced a catalog. - Persist plugin-provided source revisions into manifest rows, falling back to the existing generic integration/connection revision when absent. - Set OpenAPI source revisions from `specHash`. - Set MCP source revisions from a hash of the discovered MCP tool manifest. - Add SDK regressions for stale manifest self-healing and plugin-provided source revision persistence. ## Call Stack ```text executor.tools.manifest(filter) -> syncStaleConnectionTools() -> compare integration.config_revised_at with connection.tools_synced_at -> produceConnectionTools() for stale connections -> read tool_schema_manifest rows -> apply filters and policy visibility -> return compact indexing projection ``` ```text plugin.resolveTools() -> returns tools, definitions, optional sourceRevision produceConnectionTools() -> sourceRevision ?? generic integration/connection revision -> write tool rows, definition rows, manifest rows, catalog revision row ``` ## Tests - `bun run --cwd packages/core/sdk test -- src/executor.test.ts` - `bun run --cwd packages/core/sdk typecheck` - `bun run --cwd packages/plugins/openapi test -- src/sdk/spec-blob.test.ts src/sdk/plugin.test.ts` - `bun run --cwd packages/plugins/openapi typecheck` - `bun run --cwd packages/plugins/mcp test -- src/sdk/plugin.test.ts` - `bun run --cwd packages/plugins/mcp typecheck` - `bun run --cwd packages/plugins/semantic-search test -- src/sdk/tool-search-index.test.ts` - `bun run --cwd packages/plugins/semantic-search typecheck` - `bunx oxfmt --check packages/core/sdk/src/plugin.ts packages/core/sdk/src/executor.ts packages/core/sdk/src/executor.test.ts packages/plugins/openapi/src/sdk/backing.ts packages/plugins/mcp/src/sdk/plugin.ts` - `bunx oxlint -c .oxlintrc.jsonc --deny-warnings packages/core/sdk/src/plugin.ts packages/core/sdk/src/executor.ts packages/core/sdk/src/executor.test.ts packages/plugins/openapi/src/sdk/backing.ts packages/plugins/mcp/src/sdk/plugin.ts` - `git diff --check` ## Notes No deploy was performed. This is one milestone PR rather than a stack because the existing semantic-search indexing code already consumes `tools.manifest()` and passed its targeted tests against this change.  ### [Stack](https://github.com/aryasaatvik/stack) 1. **#63** 👈 current 2. #64 3. #65

Carry tool manifest source revisions into index scan jobs so indexing work records the catalog source version that produced each fingerprint. This keeps semantic-search tied to the manifest projection without adding schema reads to scan.

## Summary Hardens the per-run manifest snapshot layer so scans are pinned to the manifest captured at index creation time. This protects the KV-only scan path from accidentally falling back to a later live catalog read. ## Changes - Add regression coverage for create-time manifest snapshot stability. - Prove `scan` does not reread a changed live manifest after `create`. - Assert the scanned job keeps the original snapshot fingerprint and source revision. ## Call Stack ```text ToolSearchIndex.create() -> listToolManifests() once -> partition manifests -> write index-manifest/v1/{runId}/{partition} to executor.cache ToolSearchIndex.scan() -> read partition snapshot from executor.cache -> never call executor.tools.manifest() -> materialize jobs from the snapshot ``` ## Tests - `bun run --cwd packages/plugins/semantic-search test -- src/sdk/tool-search-index.test.ts` - `bun run --cwd packages/plugins/semantic-search typecheck` - `bunx oxfmt --check packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts` - `bunx oxlint -c .oxlintrc.jsonc --deny-warnings packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts` ## Stack Base: #64  ### [Stack](https://github.com/aryasaatvik/stack) 1. #63 2. #64 3. **#65** 👈 current

This was referenced Jun 24, 2026

perf(semantic-search): snapshot manifest partitions #65

Merged

feat(sdk): harden tool manifest projection for indexing #63

Merged

greptile-apps Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts

aryasaatvik changed the base branch from feat/tool-manifest-indexing-foundation to dev June 26, 2026 13:06

aryasaatvik added 2 commits June 26, 2026 18:36

test(semantic-search): cover skipped manifest revisions (greptile)

14f8258

aryasaatvik force-pushed the perf/semantic-search-manifest-indexing branch from 3beadf6 to 14f8258 Compare June 26, 2026 13:06

aryasaatvik merged commit 40d5680 into dev Jun 26, 2026
8 checks passed

aryasaatvik deleted the perf/semantic-search-manifest-indexing branch June 26, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(semantic-search): index from manifest projection#64

perf(semantic-search): index from manifest projection#64
aryasaatvik merged 2 commits into
devfrom
perf/semantic-search-manifest-indexing

aryasaatvik commented Jun 24, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aryasaatvik commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Call Stack

Tests

Stack

Stack

Uh oh!

greptile-apps Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aryasaatvik commented Jun 24, 2026 •

edited

Loading

greptile-apps Bot commented Jun 24, 2026 •

edited

Loading