feat(search): collapse over-long separators in recalled snippets#819
Open
niemst wants to merge 6 commits into
Open
feat(search): collapse over-long separators in recalled snippets#819niemst wants to merge 6 commits into
niemst wants to merge 6 commits into
Conversation
Decorative separators (= * 80 banners, --- rules) carried in indexed content are echoed verbatim on ctx_search recall, spending context tokens for ~zero information. Collapse a 12+ char single-char run to ccc…×N (lossless about length). Threshold 12 leaves markdown rules, fences, and diff hunks untouched. Applied only to the returned snippet, so FTS5 highlight offsets are unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What / Why / How
ctx_searchechoes indexed content verbatim, so decorative separators (=×80 banners, long---rules) are recalled in full — spending context tokens for ~zero information.extractSnippetnow collapses any single non-whitespace char repeated 12+ times toccc…×N, lossless about the original length.---rules,***/___,~~~fences,@@diff hunks.\S-only, so whitespace/newline structure is never touched.Before
After
Affected platforms
Recall snippet formatting is platform-agnostic.
Test plan
Added 5 cases to
tests/core/search.test.ts(§ extractSnippet collapses over-long separators): 80-char banner →===…×80, short markup untouched, lossless×N, whitespace runs untouched, collapse survives the windowed/truncated path.npm test: 4244 pass (2 pre-existing macOS locale/PWD env failures, unrelated to this change).npm run typecheck: clean.Bundles not rebuilt on purpose —
bundle.ymlregenerates them on merge tomain, andnext's committed bundle is already stale vs source, so a local rebuild would only add unrelated churn.Checklist
npm testpasses (2 unrelated pre-existing env failures)npm run typecheckpassesnextbranch