Skip to content

perf(semantic-search): index from manifest projection#64

Merged
aryasaatvik merged 2 commits into
devfrom
perf/semantic-search-manifest-indexing
Jun 26, 2026
Merged

perf(semantic-search): index from manifest projection#64
aryasaatvik merged 2 commits into
devfrom
perf/semantic-search-manifest-indexing

Conversation

@aryasaatvik

@aryasaatvik aryasaatvik commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Summary

Carries the tool manifest projection into semantic-search scan jobs by retaining each manifest source revision alongside the fingerprint. This makes index work traceable to the source catalog version that produced it without adding schema reads to scan.

Changes

  • Add optional sourceRevision to semantic-search index jobs.
  • Copy ToolSchemaManifest.sourceRevision into both skipped and changed scan jobs.
  • Add regression coverage that manifest source revisions survive scan materialization.

Call Stack

ToolSearchIndex.create()
  -> executor.tools.manifest()
  -> write manifest snapshot

ToolSearchIndex.scan()
  -> read manifest projection
  -> compare manifest.indexFingerprint with stored fingerprint
  -> write IndexJob(sourceRevision, fingerprint, path, status)

Tests

  • bun run --cwd packages/plugins/semantic-search test -- src/sdk/tool-search-index.test.ts
  • bun run --cwd packages/plugins/semantic-search typecheck
  • bunx oxfmt --check packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/tool-search-index.ts packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts
  • bunx oxlint -c .oxlintrc.jsonc --deny-warnings packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/tool-search-index.ts packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts

Stack

Base: #63

Stack

  1. feat(sdk): harden tool manifest projection for indexing #63
  2. perf(semantic-search): index from manifest projection #64 👈 current
  3. perf(semantic-search): snapshot manifest partitions #65

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown

Greptile Summary

This PR threads sourceRevision from the ToolSchemaManifest projection into both branches of scan() — skipped and pendingChunk — so every emitted IndexJob can be traced back to the catalog version that produced it, without any additional schema reads at runtime.

  • collections.ts: sourceRevision: Schema.optional(Schema.String) is appended to IndexJob; the field is optional so existing persisted records deserialize cleanly without a migration.
  • tool-search-index.ts: Both job-construction branches use the same conditional spread (manifest.sourceRevision === undefined ? {} : { sourceRevision }) to omit the field when the manifest does not carry one.
  • tool-search-index.test.ts: Adds a single test covering both the changed and skipped paths with a manifest that carries sourceRevision: "spec-hash-v1", plus a negative assertion in an existing test confirming the field stays absent when the manifest omits it.

Confidence Score: 5/5

Safe to merge — the change is additive and backward-compatible, both scan branches are symmetric, and the test suite now covers all four cases (changed with revision, skipped with revision, changed without revision, and the negative undefined assertion).

The field is optional so no persisted records break on read. The conditional spread pattern is identical in both branches, and the PR addresses the previously flagged gap by testing both the skipped and changed paths explicitly.

No files require special attention.

Important Files Changed

Filename Overview
packages/plugins/semantic-search/src/sdk/collections.ts Adds optional sourceRevision: Schema.optional(Schema.String) field to the IndexJob struct; backward-compatible schema change with no migration required.
packages/plugins/semantic-search/src/sdk/tool-search-index.ts Spreads sourceRevision from the manifest into both the skipped and pendingChunk job branches using a conditional spread; logic is symmetric and correct in both paths.
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts Extends manifestForTool helper to accept an optional sourceRevision; adds a comprehensive regression test covering both the changed and skipped scan branches, plus a negative assertion for the no-revision case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["scan()"] --> B["Load manifest projections"]
    B --> C{"For each manifest:\nstoredRow?.fingerprint\n== indexFingerprint?"}
    C -- "Yes (fingerprint match)" --> D["Build skipped IndexJob\n+ sourceRevision spread"]
    C -- "No (new/changed)" --> E["Build pendingChunk IndexJob\n+ sourceRevision spread"]
    D --> F["Increment skipped counter"]
    E --> G["Increment changed counter"]
    F --> H["putJobs()"]
    G --> H
    H --> I["Return scan result\n(processed, changed, skipped, …)"]

    style D fill:#d4edda
    style E fill:#d4edda
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["scan()"] --> B["Load manifest projections"]
    B --> C{"For each manifest:\nstoredRow?.fingerprint\n== indexFingerprint?"}
    C -- "Yes (fingerprint match)" --> D["Build skipped IndexJob\n+ sourceRevision spread"]
    C -- "No (new/changed)" --> E["Build pendingChunk IndexJob\n+ sourceRevision spread"]
    D --> F["Increment skipped counter"]
    E --> G["Increment changed counter"]
    F --> H["putJobs()"]
    G --> H
    H --> I["Return scan result\n(processed, changed, skipped, …)"]

    style D fill:#d4edda
    style E fill:#d4edda
Loading

Reviews (3): Last reviewed commit: "test(semantic-search): cover skipped man..." | Re-trigger Greptile

Comment thread packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts
@aryasaatvik aryasaatvik changed the base branch from feat/tool-manifest-indexing-foundation to dev June 26, 2026 13:06
aryasaatvik added a commit that referenced this pull request Jun 26, 2026
## Summary

Hardens the tool manifest projection used by indexing so manifest reads
observe the same stale-catalog freshness behavior as `tools.list`, and
lets plugins persist explicit source revisions for produced tool
catalogs.

This keeps the existing separate `tool_schema_manifest` table and column
shape. The change is a greenfield semantic cleanup of the manifest
contract, not a table rewrite.

## Changes

- Run stale connection catalog sync before `executor.tools.manifest()`
reads manifest rows.
- Add optional `ResolveToolsResult.sourceRevision` so plugins can
describe the source revision that produced a catalog.
- Persist plugin-provided source revisions into manifest rows, falling
back to the existing generic integration/connection revision when
absent.
- Set OpenAPI source revisions from `specHash`.
- Set MCP source revisions from a hash of the discovered MCP tool
manifest.
- Add SDK regressions for stale manifest self-healing and
plugin-provided source revision persistence.

## Call Stack

```text
executor.tools.manifest(filter)
  -> syncStaleConnectionTools()
     -> compare integration.config_revised_at with connection.tools_synced_at
     -> produceConnectionTools() for stale connections
  -> read tool_schema_manifest rows
  -> apply filters and policy visibility
  -> return compact indexing projection
```

```text
plugin.resolveTools()
  -> returns tools, definitions, optional sourceRevision
produceConnectionTools()
  -> sourceRevision ?? generic integration/connection revision
  -> write tool rows, definition rows, manifest rows, catalog revision row
```

## Tests

- `bun run --cwd packages/core/sdk test -- src/executor.test.ts`
- `bun run --cwd packages/core/sdk typecheck`
- `bun run --cwd packages/plugins/openapi test --
src/sdk/spec-blob.test.ts src/sdk/plugin.test.ts`
- `bun run --cwd packages/plugins/openapi typecheck`
- `bun run --cwd packages/plugins/mcp test -- src/sdk/plugin.test.ts`
- `bun run --cwd packages/plugins/mcp typecheck`
- `bun run --cwd packages/plugins/semantic-search test --
src/sdk/tool-search-index.test.ts`
- `bun run --cwd packages/plugins/semantic-search typecheck`
- `bunx oxfmt --check packages/core/sdk/src/plugin.ts
packages/core/sdk/src/executor.ts packages/core/sdk/src/executor.test.ts
packages/plugins/openapi/src/sdk/backing.ts
packages/plugins/mcp/src/sdk/plugin.ts`
- `bunx oxlint -c .oxlintrc.jsonc --deny-warnings
packages/core/sdk/src/plugin.ts packages/core/sdk/src/executor.ts
packages/core/sdk/src/executor.test.ts
packages/plugins/openapi/src/sdk/backing.ts
packages/plugins/mcp/src/sdk/plugin.ts`
- `git diff --check`

## Notes

No deploy was performed. This is one milestone PR rather than a stack
because the existing semantic-search indexing code already consumes
`tools.manifest()` and passed its targeted tests against this change.

<!-- stack:links:start -->
### [Stack](https://github.com/aryasaatvik/stack)

1. **#63** 👈 current
2. #64
3. #65
<!-- stack:links:end -->
Carry tool manifest source revisions into index scan jobs so indexing work records the catalog source version that produced each fingerprint. This keeps semantic-search tied to the manifest projection without adding schema reads to scan.
@aryasaatvik aryasaatvik force-pushed the perf/semantic-search-manifest-indexing branch from 3beadf6 to 14f8258 Compare June 26, 2026 13:06
@aryasaatvik aryasaatvik merged commit 40d5680 into dev Jun 26, 2026
8 checks passed
aryasaatvik added a commit that referenced this pull request Jun 26, 2026
## Summary

Hardens the per-run manifest snapshot layer so scans are pinned to the
manifest captured at index creation time. This protects the KV-only scan
path from accidentally falling back to a later live catalog read.

## Changes

- Add regression coverage for create-time manifest snapshot stability.
- Prove `scan` does not reread a changed live manifest after `create`.
- Assert the scanned job keeps the original snapshot fingerprint and
source revision.

## Call Stack

```text
ToolSearchIndex.create()
  -> listToolManifests() once
  -> partition manifests
  -> write index-manifest/v1/{runId}/{partition} to executor.cache

ToolSearchIndex.scan()
  -> read partition snapshot from executor.cache
  -> never call executor.tools.manifest()
  -> materialize jobs from the snapshot
```

## Tests

- `bun run --cwd packages/plugins/semantic-search test --
src/sdk/tool-search-index.test.ts`
- `bun run --cwd packages/plugins/semantic-search typecheck`
- `bunx oxfmt --check
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts`
- `bunx oxlint -c .oxlintrc.jsonc --deny-warnings
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts`

## Stack

Base: #64

<!-- stack:links:start -->
### [Stack](https://github.com/aryasaatvik/stack)

1. #63
2. #64
3. **#65** 👈 current
<!-- stack:links:end -->
@aryasaatvik aryasaatvik deleted the perf/semantic-search-manifest-indexing branch June 26, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant