Skip to content

feat(knowledge): add-source conflict dialog and directory source/relativePath model#16189

Merged
0xfullex merged 4 commits into
CherryHQ:mainfrom
eeee0717:feat/knowledge-add-conflict-dialog
Jun 19, 2026
Merged

feat(knowledge): add-source conflict dialog and directory source/relativePath model#16189
0xfullex merged 4 commits into
CherryHQ:mainfrom
eeee0717:feat/knowledge-add-conflict-dialog

Conversation

@eeee0717

@eeee0717 eeee0717 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

What this PR does

Before this PR, the v2 knowledge base "add data source" flow added items immediately with no same-name handling, picked file/folder sources through in-app dropzone/folder panels, and modeled directory items with a redundant path field duplicating source.

After this PR:

  • Same-name conflict dialog on add — conflicts are detected per type (file/directory by source basename, note by first line, url by the raw data.url) against existing root items plus earlier items in the same submit batch, then a Finder-style dialog offers one decision for the whole batch: 保留全部 (keep all, auto _N rename) / 替换 (replace — the incoming source wins) / 取消 (abort the batch).
  • Native OS pickers — file and directory sources open the OS picker directly, removing the in-app dropzone/folder panels while reusing the same conflict flow.
  • Directory source/relativePath model — directory items are modeled as source (the original folder, re-scanned on expansion) + relativePath (the deduped raw/<name> prefix, written back on first expansion via updateDirectoryRelativePath); expanded children are stored under that deduped prefix (with _N on collision) instead of the owner UUID. Item titles show the deduped relativePath basename, else the source basename.
  • Delete removes the whole raw/<prefix> subtree and prunes the empty shells left behind.
CleanShot 2026-06-18 at 17 01 46@2x
CleanShot.2026-06-18.at.20.14.37.mp4

Fixes #
N/A — from internal QA / design feedback on the v2 knowledge base, no public issue.

Why we need it and why it was done in this way

Adding a source whose name already exists previously failed silently or collided on disk; this gives the user an explicit, Finder-like choice. Detection key and display name are deliberately separated so the dialog shows the list title the user recognizes while matching on a stable key (url keys off the raw URL because the post-index snapshot title is absent at add-time).

The following tradeoffs were made:

  • 替换 is implemented as synchronous delete-old + add-new inside the addItems mutation lock (not the async delete queue) — otherwise the new file would be _N-renamed before the delete lands and 替换 would degrade to 保留全部.
  • addItems(baseId, inputs, conflictStrategy = 'rename') folds detection into the existing call (default 'rename' preserves internal callers like restore/migrator); no new IPC channel.
  • Directory items keep the original folder in source and only the deduped raw/ prefix in relativePath, mirroring the file source/relativePath split, so re-expansion always re-scans the real folder.

The following alternatives were considered:

  • A separate precheck IPC channel for conflict detection (rejected — TOCTOU between precheck and add, and an extra round-trip when there is no conflict).
  • In-place reuse on 替换 (rejected — doesn't fit the create-centric add loop or same-batch "last wins").

Breaking changes

None for users. Data-model note: the directory item data shape changes from { source, path } to { source, relativePath? }; knowledge_item.data is a JSON column, so there is no schema/DDL migration. The v2 knowledge base is still pre-release.

Special notes for your reviewer

Checklist

  • Branch: This PR targets the correct branch — main for active development, v1 for v1 maintenance fixes
  • PR: The PR description is expressive enough and will help future contributors
  • Code: Write code that humans can understand and Keep it simple
  • Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
  • Upgrade: Impact of this change on upgrade flows was considered (none — JSON data column, no migration)
  • Documentation: A user-guide update was considered; not required (v2 knowledge base is still evolving)
  • Self-review: I have reviewed my own code before requesting review from others

Release note

Knowledge base: adding a data source whose name already exists now shows a conflict dialog (keep all / replace / cancel) instead of silently failing or colliding; file and folder sources now open the native OS picker.

@eeee0717 eeee0717 requested a review from a team June 18, 2026 08:21
@eeee0717 eeee0717 force-pushed the feat/knowledge-add-conflict-dialog branch from 9e8884f to 7e6c86b Compare June 18, 2026 08:50
@eeee0717 eeee0717 marked this pull request as draft June 18, 2026 09:04
@eeee0717 eeee0717 force-pushed the feat/knowledge-add-conflict-dialog branch from 7e6c86b to 006d18d Compare June 18, 2026 09:06
@eeee0717 eeee0717 marked this pull request as ready for review June 18, 2026 09:07

@zhangjiadi225 zhangjiadi225 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review: 6 parallel module reviewers + 1 adversarial verifier (every finding stress-tested). CI is green. The change is well-structured — lock ordering (cancel outside / purge inside), vector-before-row deletion, best-effort file delete, the additive IPC contract, and the path -> relativePath rename are all sound. The items below should be addressed before merge.

Blockers

  • B1 — replace purges only one of several same-name roots (inline on KnowledgeWorkflowService.ts). resolveKnowledgeAddConflicts keeps only the first existing root per (type, conflictKey), but keep-all routinely creates duplicates (two report.pdf from different folders; two notes with the same first line). A replace then leaves a stale duplicate, violating the documented contract.

  • B2 — directory raw/ prefix drifts on every reindex (inline on prepare.ts). collectReservedTopLevelNames counts the container's own relativePath as reserved against itself, so re-expansion dedups docs -> docs_1 -> docs_2 ..., churning on-disk material and the display title with no user action. Fix: pass excludeItemId for the container.

  • B3 — raw NUL (0x00) byte in addConflicts.ts:30. The conflict-map-key separator is a literal 0x00 control byte inside the template literal, instead of the escape sequence backslash-zero. The accompanying comment is correct; only the literal is wrong. Consequences: (1) git classifies the whole file as binary (verified — git diff --numstat reports dashes), so it has no reviewable diff or blame and cannot receive inline review comments; (2) it is fragile — a formatter, editor, or copy-paste can silently drop the byte, after which the separator becomes empty and toConflictMapKey loses its anti-aliasing guarantee. Fix: replace the raw byte with the escape sequence backslash-zero (or backslash-u-0000).

Warning

  • W1 — dotted directory names mangled by extension-based dedup (inline on directory.ts). On collision a folder report.v2 becomes report_1.v2 instead of report.v2_1. Cosmetic, but compounds with B2 (each reindex forces a self-collision).

Notices

  • N2 (inline) — conflict-dialog buttons are all variant=outline; per DESIGN.md the primary action should use emphasis and the destructive Replace should use destructive.
  • N3 (inline) — empty-content notes share an empty conflict key, so two blank notes are treated as same-name.
  • Nit (no inline anchor — outside the diff): useAddKnowledgeItems.test.ts:35 still constructs a directory input with the removed path field; stale leftover of the rename, worth cleaning up.

Comment thread src/main/features/knowledge/utils/sources/prepare.ts Outdated
Comment thread src/main/features/knowledge/utils/sources/directory.ts Outdated
case 'directory':
return getKnowledgePathBasename(data.source || '')
case 'note':
return getKnowledgeNoteFirstLine(data.content || '')

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[A2 · Notice] For a note, getKnowledgeItemConflictKey returns the first non-empty line of data.content, which is the empty string for blank content. The runtime add schema does not reject this — RuntimeNoteItemDataSchema.content is z.string().max(...) with no .min(1) (unlike url/source/relativePath). So two empty-content notes produce the same empty conflict key and are treated as same-name. Rare (notes come from picked files) but reachable — consider a unique fallback key, or skipping conflict detection, when the note key is empty.

Comment on lines +71 to +87
<Button
variant="outline"
onClick={() => onResolve('rename')}
loading={pendingResolution === 'rename'}
disabled={isResolving}>
{t('knowledge.data_source.add_dialog.conflict_dialog.keep_all')}
</Button>
<Button
variant="outline"
onClick={() => onResolve('replace')}
loading={pendingResolution === 'replace'}
disabled={isResolving}>
{t('knowledge.data_source.add_dialog.conflict_dialog.replace')}
</Button>
<Button variant="outline" onClick={onCancel} disabled={isResolving}>
{t('common.cancel')}
</Button>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[C7 · Notice] All three footer buttons use variant="outline", flattening the action hierarchy in this new dialog — and Replace is destructive (it purges existing items). DESIGN.md prescribes emphasis for the primary action of a dialog and destructive for destructive ones, and the Button primitive supports both. Consider emphasis for the recommended action and destructive for Replace.

Comment on lines +167 to +173
private async purgeConflictingExistingItems(base: KnowledgeBase, itemsToAdd: KnowledgeAddItemInput[]): Promise<void> {
const currentRoots = await knowledgeItemService.getRootItemsByBaseId(base.id)
const { conflictingExistingRootIds } = resolveKnowledgeAddConflicts(itemsToAdd, currentRoots)
if (conflictingExistingRootIds.length === 0) {
return
}
const subtreeItems = await knowledgeItemService.getSubtreeItems(base.id, conflictingExistingRootIds, {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[A1 · Blocker] This replace purge (and the cancelActiveKnowledgeSubtreeJobs call at lines 90-93) drive off conflictingExistingRootIds, but resolveKnowledgeAddConflicts (addConflicts.ts) collects only the FIRST existing root per (type, conflictKey): existingByKey is filled under an if (!existingByKey.has(mapKey)) guard, and conflictingExistingRootIds is populated solely from that single existing.id. The default rename/keep-all flow routinely creates several roots sharing one conflict key — e.g. /a/report.pdf + /b/report.pdf (both source basename report.pdf), or two notes with the same first line (see KnowledgeService.integration.test.ts:284). A replace then purges only ONE of them and adds the incoming source, so a stale duplicate survives — violating the documented replace contract (Conflicting existing items are purged).

Fix: in resolveKnowledgeAddConflicts, collect ALL matching existing-root ids per key (e.g. Map<string, KnowledgeItem[]>) and return all of them; add a test with two existing same-key roots asserting both are purged on replace.

(Note: addConflicts.ts itself cannot take an inline comment — it is committed as a binary file due to a stray NUL byte; see the review summary.)

@eeee0717

Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough multi-agent review — all seven items are addressed in 7ee96b28a. Every finding was real (none pushed back on); summary below, with tests pinning each behavioral fix and the full lint/typecheck/test gate re-run locally and green.

Fixed

  • B1 (addConflicts.ts) — resolveKnowledgeAddConflicts now groups existing roots per (type, conflictKey) into a Map<string, KnowledgeItem[]> and adds every member of the matched group to conflictingExistingRootIds. A replace therefore purges the whole duplicate set (e.g. both report.pdf and report_1.pdf), restoring the documented contract. The reported title stays stable via the first root in the group.
  • B2 (prepare.ts) — collectReservedTopLevelNames now receives the container's item.id as excludeItemId, so a reindex no longer counts the container's own pinned prefix against itself. docs stays docs across re-expansions instead of drifting to docs_1/docs_2.
  • B3 (addConflicts.ts:30) — replaced the literal 0x00 separator with the \0 escape sequence. The working tree now contains zero NUL bytes (verified), so the file is plain UTF-8 text again and the new blob carries a normal reviewable diff/blame.
  • W1 (knowledge.ts + directory.ts) — nextFreeKnowledgeRelativePath gained a splitExtension flag (default true, preserving the file-dedup callers); the directory-prefix caller passes false, so report.v2 dedupes to report.v2_1 rather than report_1.v2. Combined with B2 this removes the self-collision churn.
  • N2 (KnowledgeAddConflictDialog.tsx) — keep-all now uses variant="emphasis" (primary) and Replace uses variant="destructive", per DESIGN.md.
  • N3 (addConflicts.ts) — an empty detection key (blank-content note, blank URL) is now skipped in detection and always kept, so two blank notes are never treated as the same name.
  • Nit (useAddKnowledgeItems.test.ts) — removed the stale path field from the directory input fixture.

Tests
Added/updated coverage that fails if the corresponding fix is reverted: group-purge (addConflicts.test.ts), blank-note non-collision (addConflicts.test.ts), dotted-directory dedup at the real call site (directory.test.ts, since the isolated function test couldn't catch a dropped false argument), splitExtension=false at the unit level (knowledge.test.ts), and container self-exclusion on reindex (prepare.test.ts). Re-verified locally: type-aware oxlint + eslint + biome format:check + tsgo (node & web) + the affected vitest suites, all green.

@eeee0717 eeee0717 requested a review from zhangjiadi225 June 18, 2026 10:35

@zhangjiadi225 zhangjiadi225 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review feedback addressed in 7ee96b2 — the two purge/prefix-drift blockers, the raw NUL byte (file is no longer binary), the dotted-directory dedup, the dialog button hierarchy, and the empty-note conflict key are all fixed, each with a targeted regression test. CI is green. LGTM.

@eeee0717 eeee0717 force-pushed the feat/knowledge-add-conflict-dialog branch 2 times, most recently from fcf00d0 to 649cde0 Compare June 18, 2026 12:24
eeee0717 and others added 3 commits June 19, 2026 10:21
…tivePath model

Rebuilt on top of the directory field-rename PR. Bundles the knowledge
add-source work and repurposes `relativePath` from the rename:

- same-name conflict dialog on add (keep-all / replace / cancel)
- open the OS picker directly for file & directory sources
- store directory children under the deduped `raw/<name>` prefix (with
  `_N` suffix on collision), not the owner UUID
- model directory items as `source` (original folder) + `relativePath`
  (deduped `raw/` prefix written back on first expansion); expansion now
  scans `source`, the prefix is pinned via updateDirectoryRelativePath
- item titles show the deduped `relativePath` basename, else `source`
- delete removes the whole `raw/<prefix>` subtree and prunes empty shells

Squashed rebuild of the former feat/knowledge-add-conflict-dialog
history onto the rename PR; final tree is identical to the validated
combined state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: eeee0717 <chentao020717Work@outlook.com>
- purge every existing root sharing a conflict key on replace, not just the
  first — keep-all can leave several roots under one name (report.pdf,
  report_1.pdf), and replace must overwrite the whole group
- exclude a directory from its own reserved-name set on reindex so its
  pinned relativePath prefix is not self-deduped to `_1` every run
- replace the raw NUL conflict-map-key separator with a `\0` escape so the
  source file stays UTF-8 text instead of a binary blob
- keep a dotted directory prefix intact when deduping (`report.v2_1`, not
  `report_1.v2`) via a splitExtension flag on nextFreeKnowledgeRelativePath
- never treat blank-content notes as the same name: an empty detection key
  is skipped in detection and always kept
- conflict dialog: use emphasis (keep-all) and destructive (replace) button
  variants to signal the safe vs lossy action

Tests pin each behavioral fix: group purge, blank-note non-collision, dotted
directory dedup at the call site, and container self-exclusion on reindex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: eeee0717 <chentao020717Work@outlook.com>
…ll same-source siblings

Add-source conflict detection keyed file/directory off the source basename,
so N kept copies of `test.md` (stored test.md / test_2.md / test_3.md)
collapsed onto one detection key. `replace` then purged every sibling
instead of the single copy the incoming source overwrites.

Key file/directory off `relativePath` (the deduped raw/ name) when present,
else the source basename: an add-input has no relativePath so detection
still fires on the source basename, while an existing item keys off its
unique deduped name, so replace targets only relativePath `test.md` and
leaves test_2.md / test_3.md intact. This also aligns the key with
getKnowledgeItemDisplayTitle. url/note are unchanged — their deduped name
is a post-index snapshot absent at add-time, so they must keep keying off
the raw url / first line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: eeee0717 <chentao020717Work@outlook.com>
@eeee0717 eeee0717 force-pushed the feat/knowledge-add-conflict-dialog branch from 649cde0 to 05ae1a0 Compare June 19, 2026 02:24
@DeJeune DeJeune added the ready-to-merge This PR has sufficient reviewer approvals but is awaiting code owner approval. label Jun 19, 2026
@0xfullex 0xfullex merged commit f98b4f2 into CherryHQ:main Jun 19, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge This PR has sufficient reviewer approvals but is awaiting code owner approval.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants