Skip to content

perf(semantic-search): snapshot manifest partitions#65

Merged
aryasaatvik merged 2 commits into
devfrom
perf/semantic-search-manifest-snapshots
Jun 26, 2026
Merged

perf(semantic-search): snapshot manifest partitions#65
aryasaatvik merged 2 commits into
devfrom
perf/semantic-search-manifest-snapshots

Conversation

@aryasaatvik

@aryasaatvik aryasaatvik commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Summary

Hardens the per-run manifest snapshot layer so scans are pinned to the manifest captured at index creation time. This protects the KV-only scan path from accidentally falling back to a later live catalog read.

Changes

  • Add regression coverage for create-time manifest snapshot stability.
  • Prove scan does not reread a changed live manifest after create.
  • Assert the scanned job keeps the original snapshot fingerprint and source revision.

Call Stack

ToolSearchIndex.create()
  -> listToolManifests() once
  -> partition manifests
  -> write index-manifest/v1/{runId}/{partition} to executor.cache

ToolSearchIndex.scan()
  -> read partition snapshot from executor.cache
  -> never call executor.tools.manifest()
  -> materialize jobs from the snapshot

Tests

  • bun run --cwd packages/plugins/semantic-search test -- src/sdk/tool-search-index.test.ts
  • bun run --cwd packages/plugins/semantic-search typecheck
  • bunx oxfmt --check packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts
  • bunx oxlint -c .oxlintrc.jsonc --deny-warnings packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts

Stack

Base: #64

Stack

  1. feat(sdk): harden tool manifest projection for indexing #63
  2. perf(semantic-search): index from manifest projection #64
  3. perf(semantic-search): snapshot manifest partitions #65 👈 current

@greptile-apps

greptile-apps Bot commented Jun 24, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a single regression test to tool-search-index.test.ts that proves scan() reads the manifest snapshot captured at create() time rather than re-querying the live manifest after it changes.

  • Injects a mutable liveManifests closure, calls create(), then mutates the closure to simulate a live catalog update, and asserts that the subsequent scan() job still carries the original "fp-at-create" fingerprint and "source-at-create" source revision.
  • Guards the destructured tool variable with both an explicit expect(tool).toBeDefined() and a TypeScript narrowing guard, ensuring the test fails loudly if makeTools(1) ever returns an empty array.

Confidence Score: 5/5

Test-only change that adds targeted regression coverage with no production code modifications; safe to merge.

The change is confined to a single test file and adds one well-structured Effect test. It correctly mutates the live manifest after create() and asserts that scan() reads from the KV snapshot, verifying the isolation guarantee end-to-end. The expect(tool).toBeDefined() guard ensures the test fails explicitly if the fixture changes, and all assertions target the right observable state.

No files require special attention.

Important Files Changed

Filename Overview
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts Adds one new Effect-based test that verifies the manifest snapshot isolation guarantee; test structure, assertions, and helper reuse all look correct.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Test
    participant ToolSearchIndex
    participant executor.tools.manifest
    participant executor.cache (KV)

    Test->>ToolSearchIndex: "create(runId, partitionCount=1)"
    ToolSearchIndex->>executor.tools.manifest: "manifest() [call #1]"
    executor.tools.manifest-->>ToolSearchIndex: "[{ fp-at-create, source-at-create }]"
    ToolSearchIndex->>executor.cache (KV): write index-manifest/v1/runId/0

    Note over Test: mutate liveManifests to fp-after-create

    Test->>ToolSearchIndex: "scan(runId, partition=0, limit=10)"
    ToolSearchIndex->>executor.cache (KV): read index-manifest/v1/runId/0
    executor.cache (KV)-->>ToolSearchIndex: [{ fp-at-create, source-at-create }]
    Note over ToolSearchIndex: manifest() never called again
    ToolSearchIndex-->>Test: "{ processed:1, changed:1, skipped:0 }"
    Note over Test: asserts job.fingerprint == fp-at-create
    Note over Test: asserts counters.manifest == 1
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Test
    participant ToolSearchIndex
    participant executor.tools.manifest
    participant executor.cache (KV)

    Test->>ToolSearchIndex: "create(runId, partitionCount=1)"
    ToolSearchIndex->>executor.tools.manifest: "manifest() [call #1]"
    executor.tools.manifest-->>ToolSearchIndex: "[{ fp-at-create, source-at-create }]"
    ToolSearchIndex->>executor.cache (KV): write index-manifest/v1/runId/0

    Note over Test: mutate liveManifests to fp-after-create

    Test->>ToolSearchIndex: "scan(runId, partition=0, limit=10)"
    ToolSearchIndex->>executor.cache (KV): read index-manifest/v1/runId/0
    executor.cache (KV)-->>ToolSearchIndex: [{ fp-at-create, source-at-create }]
    Note over ToolSearchIndex: manifest() never called again
    ToolSearchIndex-->>Test: "{ processed:1, changed:1, skipped:0 }"
    Note over Test: asserts job.fingerprint == fp-at-create
    Note over Test: asserts counters.manifest == 1
Loading

Reviews (4): Last reviewed commit: "test(semantic-search): assert snapshot t..." | Re-trigger Greptile

Comment thread packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts
@aryasaatvik aryasaatvik force-pushed the perf/semantic-search-manifest-snapshots branch from 40d1f17 to a53c3bf Compare June 26, 2026 12:32
aryasaatvik added a commit that referenced this pull request Jun 26, 2026
## Summary

Hardens the tool manifest projection used by indexing so manifest reads
observe the same stale-catalog freshness behavior as `tools.list`, and
lets plugins persist explicit source revisions for produced tool
catalogs.

This keeps the existing separate `tool_schema_manifest` table and column
shape. The change is a greenfield semantic cleanup of the manifest
contract, not a table rewrite.

## Changes

- Run stale connection catalog sync before `executor.tools.manifest()`
reads manifest rows.
- Add optional `ResolveToolsResult.sourceRevision` so plugins can
describe the source revision that produced a catalog.
- Persist plugin-provided source revisions into manifest rows, falling
back to the existing generic integration/connection revision when
absent.
- Set OpenAPI source revisions from `specHash`.
- Set MCP source revisions from a hash of the discovered MCP tool
manifest.
- Add SDK regressions for stale manifest self-healing and
plugin-provided source revision persistence.

## Call Stack

```text
executor.tools.manifest(filter)
  -> syncStaleConnectionTools()
     -> compare integration.config_revised_at with connection.tools_synced_at
     -> produceConnectionTools() for stale connections
  -> read tool_schema_manifest rows
  -> apply filters and policy visibility
  -> return compact indexing projection
```

```text
plugin.resolveTools()
  -> returns tools, definitions, optional sourceRevision
produceConnectionTools()
  -> sourceRevision ?? generic integration/connection revision
  -> write tool rows, definition rows, manifest rows, catalog revision row
```

## Tests

- `bun run --cwd packages/core/sdk test -- src/executor.test.ts`
- `bun run --cwd packages/core/sdk typecheck`
- `bun run --cwd packages/plugins/openapi test --
src/sdk/spec-blob.test.ts src/sdk/plugin.test.ts`
- `bun run --cwd packages/plugins/openapi typecheck`
- `bun run --cwd packages/plugins/mcp test -- src/sdk/plugin.test.ts`
- `bun run --cwd packages/plugins/mcp typecheck`
- `bun run --cwd packages/plugins/semantic-search test --
src/sdk/tool-search-index.test.ts`
- `bun run --cwd packages/plugins/semantic-search typecheck`
- `bunx oxfmt --check packages/core/sdk/src/plugin.ts
packages/core/sdk/src/executor.ts packages/core/sdk/src/executor.test.ts
packages/plugins/openapi/src/sdk/backing.ts
packages/plugins/mcp/src/sdk/plugin.ts`
- `bunx oxlint -c .oxlintrc.jsonc --deny-warnings
packages/core/sdk/src/plugin.ts packages/core/sdk/src/executor.ts
packages/core/sdk/src/executor.test.ts
packages/plugins/openapi/src/sdk/backing.ts
packages/plugins/mcp/src/sdk/plugin.ts`
- `git diff --check`

## Notes

No deploy was performed. This is one milestone PR rather than a stack
because the existing semantic-search indexing code already consumes
`tools.manifest()` and passed its targeted tests against this change.

<!-- stack:links:start -->
### [Stack](https://github.com/aryasaatvik/stack)

1. **#63** 👈 current
2. #64
3. #65
<!-- stack:links:end -->
@aryasaatvik aryasaatvik force-pushed the perf/semantic-search-manifest-indexing branch from 3beadf6 to 14f8258 Compare June 26, 2026 13:06
@aryasaatvik aryasaatvik force-pushed the perf/semantic-search-manifest-snapshots branch from a53c3bf to e25c61a Compare June 26, 2026 13:06
@aryasaatvik aryasaatvik changed the base branch from perf/semantic-search-manifest-indexing to dev June 26, 2026 14:41
aryasaatvik added a commit that referenced this pull request Jun 26, 2026
## Summary

Carries the tool manifest projection into semantic-search scan jobs by
retaining each manifest source revision alongside the fingerprint. This
makes index work traceable to the source catalog version that produced
it without adding schema reads to scan.

## Changes

- Add optional `sourceRevision` to semantic-search index jobs.
- Copy `ToolSchemaManifest.sourceRevision` into both skipped and changed
scan jobs.
- Add regression coverage that manifest source revisions survive scan
materialization.

## Call Stack

```text
ToolSearchIndex.create()
  -> executor.tools.manifest()
  -> write manifest snapshot

ToolSearchIndex.scan()
  -> read manifest projection
  -> compare manifest.indexFingerprint with stored fingerprint
  -> write IndexJob(sourceRevision, fingerprint, path, status)
```

## Tests

- `bun run --cwd packages/plugins/semantic-search test --
src/sdk/tool-search-index.test.ts`
- `bun run --cwd packages/plugins/semantic-search typecheck`
- `bunx oxfmt --check
packages/plugins/semantic-search/src/sdk/collections.ts
packages/plugins/semantic-search/src/sdk/tool-search-index.ts
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts`
- `bunx oxlint -c .oxlintrc.jsonc --deny-warnings
packages/plugins/semantic-search/src/sdk/collections.ts
packages/plugins/semantic-search/src/sdk/tool-search-index.ts
packages/plugins/semantic-search/src/sdk/tool-search-index.test.ts`

## Stack

Base: #63

<!-- stack:links:start -->
### [Stack](https://github.com/aryasaatvik/stack)

1. #63
2. **#64** 👈 current
3. #65
<!-- stack:links:end -->
Cover the manifest snapshot invariant that scans use the create-time manifest even if the live catalog changes before scan. This protects the KV-only scan path from accidental live-manifest fallback.
@aryasaatvik aryasaatvik force-pushed the perf/semantic-search-manifest-snapshots branch from e25c61a to 65d62e1 Compare June 26, 2026 14:41
@aryasaatvik aryasaatvik merged commit 92ae593 into dev Jun 26, 2026
8 checks passed
@aryasaatvik aryasaatvik deleted the perf/semantic-search-manifest-snapshots branch June 26, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant