feat(semantic-search): add cloudflare ai search backend#68
Conversation
Greptile SummaryThis PR replaces the Vectorize + Gemini embedding backend for semantic tool search with Cloudflare AI Search, which handles chunking, embedding, and retrieval natively. The switch eliminates the Gemini API key dependency and simplifies the plugin wiring to a single
Confidence Score: 4/5Safe to merge after fixing the self-deletion in reindexAiSearch for re-indexed tools. Cloudflare AI Search's upload() is an upsert-by-name: uploading an item with the same filename updates the existing record and returns the same id. When a tool's fingerprint changes and its document is re-uploaded, uploaded.id equals previous.itemId, so the immediately-following deleteItemBestEffort(aiSearch, previous.itemId) deletes the document that was just updated. The tool disappears from the search index until the next full reindex. There is no test covering this re-index a changed tool path, so the regression is not caught automatically. packages/plugins/semantic-search/src/sdk/ai-search.ts (reindexAiSearch stale-delete block) and packages/plugins/semantic-search/src/sdk/ai-search.test.ts (missing coverage for the fingerprint-changed update path). Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Host as host-cloudflare
participant Plugin as semantic-search plugin
participant Storage as Plugin Storage (aiSearchItems)
participant AISearch as Cloudflare AI Search
Note over Host,AISearch: Reindex flow
Host->>Plugin: reindex(executor)
Plugin->>Plugin: listToolManifests()
Plugin->>Storage: items.list() — fetch existing rows
loop For each manifest
Plugin->>Plugin: toolItemKey(manifest) — compute fingerprint
alt Fingerprint unchanged
Plugin-->>Plugin: skip
else Fingerprint changed or new
Plugin->>Plugin: collectToolSearchDocument() — build markdown doc
Plugin->>AISearch: items.upload(name, content, metadata)
Note right of AISearch: Upsert-by-name: same name = same ID
AISearch-->>Plugin: "{ id, key }"
Plugin->>Storage: "items.put(key, { itemId, fingerprint, status:queued })"
opt "previous exists (BUG: previous.itemId === uploaded.id)"
Plugin->>AISearch: items.delete(previous.itemId) deletes just-uploaded doc
end
end
end
loop Stale entries
Plugin->>Storage: items.remove(key)
Plugin->>AISearch: items.delete(itemId) best-effort
end
Plugin-->>Host: "{ indexed, skipped, removed }"
Note over Host,AISearch: Search flow
Host->>Plugin: search(query, namespace, limit)
Plugin->>AISearch: "search({ messages, ai_search_options })"
AISearch-->>Plugin: "{ chunks[] }"
Plugin->>Storage: items.list() — build key to row map
Plugin->>Plugin: deduplicate chunks by path (best score)
Plugin->>Plugin: filter by namespace, paginate
Plugin-->>Host: "{ items[], total }"
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Host as host-cloudflare
participant Plugin as semantic-search plugin
participant Storage as Plugin Storage (aiSearchItems)
participant AISearch as Cloudflare AI Search
Note over Host,AISearch: Reindex flow
Host->>Plugin: reindex(executor)
Plugin->>Plugin: listToolManifests()
Plugin->>Storage: items.list() — fetch existing rows
loop For each manifest
Plugin->>Plugin: toolItemKey(manifest) — compute fingerprint
alt Fingerprint unchanged
Plugin-->>Plugin: skip
else Fingerprint changed or new
Plugin->>Plugin: collectToolSearchDocument() — build markdown doc
Plugin->>AISearch: items.upload(name, content, metadata)
Note right of AISearch: Upsert-by-name: same name = same ID
AISearch-->>Plugin: "{ id, key }"
Plugin->>Storage: "items.put(key, { itemId, fingerprint, status:queued })"
opt "previous exists (BUG: previous.itemId === uploaded.id)"
Plugin->>AISearch: items.delete(previous.itemId) deletes just-uploaded doc
end
end
end
loop Stale entries
Plugin->>Storage: items.remove(key)
Plugin->>AISearch: items.delete(itemId) best-effort
end
Plugin-->>Host: "{ indexed, skipped, removed }"
Note over Host,AISearch: Search flow
Host->>Plugin: search(query, namespace, limit)
Plugin->>AISearch: "search({ messages, ai_search_options })"
AISearch-->>Plugin: "{ chunks[] }"
Plugin->>Storage: items.list() — build key to row map
Plugin->>Plugin: deduplicate chunks by path (best score)
Plugin->>Plugin: filter by namespace, paginate
Plugin-->>Host: "{ items[], total }"
Reviews (7): Last reviewed commit: "fix(semantic-search): tolerate ai search..." | Re-trigger Greptile |
80f2fe7 to
eb1abc7
Compare
6063ba3 to
02ad2ce
Compare
874dfe2 to
5b79372
Compare
## Summary Introduce the semantic search backend contract and move the current vector implementation behind it. ## Changes - Add `ToolSearchBackend` as the plugin-level backend boundary. - Keep the existing vector/Gemini implementation available through `ToolSearchBackend.vector`. - Wire the Cloudflare host through the backend option. ## Tests - `bun run --cwd packages/plugins/semantic-search typecheck` - `bun run --cwd packages/plugins/semantic-search test src/sdk/plugin.test.ts` - `bun run --cwd apps/host-cloudflare typecheck` - `oxfmt --check apps/host-cloudflare/src/plugins.ts packages/plugins/semantic-search/src/sdk/index.ts packages/plugins/semantic-search/src/sdk/plugin.ts packages/plugins/semantic-search/src/sdk/plugin.test.ts packages/plugins/semantic-search/src/sdk/tool-search-backend.ts` - `oxlint -c .oxlintrc.jsonc apps/host-cloudflare/src/plugins.ts packages/plugins/semantic-search/src/sdk/index.ts packages/plugins/semantic-search/src/sdk/plugin.ts packages/plugins/semantic-search/src/sdk/plugin.test.ts packages/plugins/semantic-search/src/sdk/tool-search-backend.ts --deny-warnings` <!-- stack:links:start --> ### [Stack](https://github.com/aryasaatvik/stack) 1. **#67** 👈 current 2. #68 <!-- stack:links:end -->
2050a5a to
0fe5412
Compare
Summary
Add the Cloudflare AI Search backend for the semantic search backend contract.
Changes
Tests
bun run --cwd packages/plugins/semantic-search typecheckbun run --cwd packages/plugins/semantic-search test src/sdk/plugin.test.ts src/sdk/ai-search.test.tsbun run --cwd apps/host-cloudflare typecheckoxfmt --check apps/host-cloudflare/src/app.ts apps/host-cloudflare/src/config.ts apps/host-cloudflare/src/execution.ts apps/host-cloudflare/src/plugins.ts apps/host-cloudflare/wrangler.jsonc packages/plugins/semantic-search/src/api/group.ts packages/plugins/semantic-search/src/sdk/ai-search.ts packages/plugins/semantic-search/src/sdk/ai-search.test.ts packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/documents.ts packages/plugins/semantic-search/src/sdk/index.ts packages/plugins/semantic-search/src/sdk/plugin.ts packages/plugins/semantic-search/src/sdk/tool-search-backend.tsoxlint -c .oxlintrc.jsonc apps/host-cloudflare/src/app.ts apps/host-cloudflare/src/config.ts apps/host-cloudflare/src/execution.ts apps/host-cloudflare/src/plugins.ts packages/plugins/semantic-search/src/api/group.ts packages/plugins/semantic-search/src/sdk/ai-search.ts packages/plugins/semantic-search/src/sdk/ai-search.test.ts packages/plugins/semantic-search/src/sdk/collections.ts packages/plugins/semantic-search/src/sdk/documents.ts packages/plugins/semantic-search/src/sdk/index.ts packages/plugins/semantic-search/src/sdk/plugin.ts packages/plugins/semantic-search/src/sdk/tool-search-backend.ts --deny-warningsStack