RHIDP-12952: persist interrupted conversation by Jdubrick · Pull Request #1971 · lightspeed-core/lightspeed-stack

Jdubrick · 2026-06-22T14:03:00Z

Description

When query interruption was added initially it replaced the entire conversation portion that was interrupted with the interrupt message. This change allows the half-completed message to remain after fixing any breaking code fences/html/tables/etc.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude (Cursor)
Generated by: Claude (Cursor)

Related Tickets & Documents

Related Issue https://redhat.atlassian.net/browse/RHIDP-12952
Closes https://redhat.atlassian.net/browse/RHIDP-12952

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

New Features
- Interrupted streaming responses now preserve already-generated partial text and automatically repair incomplete Markdown or HTML formatting before displaying an interruption notice.
Bug Fixes
- Improved handling of truncated content formatting when streams are interrupted.
Updates
- Updated interruption message to "Response stopped by the user."

Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>

coderabbitai · 2026-06-22T14:03:22Z

Warning

Review limit reached

@Jdubrick, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 37 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ff64f7d-24cd-4bb2-8fdd-564a08992d61

📥 Commits

Reviewing files that changed from the base of the PR and between 6aeea11 and 969f5d6.

📒 Files selected for processing (2)

src/utils/markdown_repair.py
tests/unit/utils/test_markdown_repair.py

Walkthrough

The PR replaces the static interrupted-response message with a mechanism that accumulates streamed token deltas into TurnSummary.partial_tokens, tracks the next SSE chunk index in TurnSummary.next_chunk_id, repairs any unclosed Markdown via a new close_open_markdown utility, and emits the resulting suffix as a final token event on asyncio.CancelledError in both the agent and deprecated endpoint streaming paths.

Changes

Structured Interrupted Response from Partial Tokens

Layer / File(s)	Summary
TurnSummary fields and Markdown repair utility `src/models/common/turn_summary.py`, `src/utils/markdown_repair.py`, `tests/unit/utils/test_markdown_repair.py`	`TurnSummary` gains `partial_tokens: list[str]` and `next_chunk_id: int`. New `markdown_repair.py` exposes `close_open_markdown(text)`, which returns the minimal suffix to close any open code fence and block-level HTML tags; fully tested.
`build_interrupted_response` and persistence `src/constants.py`, `src/utils/stream_interrupts.py`, `tests/unit/utils/test_stream_interrupts.py`	`build_interrupted_response(partial_tokens)` joins tokens, repairs Markdown, and returns `(full_text, suffix)`. `_on_interrupt` and `persist_interrupted_turn` now use `turn_summary.llm_response` (set to `full_text`) instead of the static constant. `INTERRUPTED_RESPONSE_MESSAGE` wording updated.
Agent streaming: accumulation and cancellation suffix `src/utils/agents/streaming.py`, `tests/unit/utils/agents/test_streaming.py`	`_process_token` appends token text to `turn_summary.partial_tokens` and sets `turn_summary.next_chunk_id`; `AgentRunResultEvent` handler also sets `next_chunk_id`. On `CancelledError`, `generate_agent_response` calls `build_interrupted_response`, persists with `full_text`, and emits a `TokenStreamPayload` carrying `suffix` at `next_chunk_id`. New tests validate monotonic chunk IDs and zero-token cancellation.
Endpoint cancellation path and delta tracking `src/app/endpoints/streaming_query.py`, `tests/unit/app/endpoints/test_streaming_query.py`	`generate_response` cancellation handler replaced to call `build_interrupted_response` and emit `LLM_TOKEN_EVENT` with the suffix. `response_generator` sets `turn_summary.next_chunk_id` at kickoff, each delta, and completion; deltas are also appended to `turn_summary.partial_tokens`.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant generate_response / generate_agent_response
  participant _process_token / response_generator
  participant build_interrupted_response
  participant close_open_markdown
  participant persist_interrupted_turn

  rect rgba(100, 149, 237, 0.5)
    note over _process_token / response_generator: Normal streaming
    _process_token / response_generator-->Client: token SSE events (chunk_id 0, 1, 2…)
    _process_token / response_generator->>generate_response / generate_agent_response: appends partial_tokens, sets next_chunk_id
  end

  rect rgba(220, 100, 100, 0.5)
    note over generate_response / generate_agent_response: asyncio.CancelledError
    generate_response / generate_agent_response->>build_interrupted_response: partial_tokens
    build_interrupted_response->>close_open_markdown: joined text
    close_open_markdown-->>build_interrupted_response: repair suffix
    build_interrupted_response-->>generate_response / generate_agent_response: (full_text, emit_suffix)
    generate_response / generate_agent_response->>persist_interrupted_turn: llm_response = full_text
    generate_response / generate_agent_response-->>Client: final token SSE event (emit_suffix, next_chunk_id)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

lightspeed-core/lightspeed-stack#1870: Modifies the same deprecated generate_response/response_generator utilities in streaming_query.py and their agent stream-event wiring.
lightspeed-core/lightspeed-stack#1918: Changes the stream-interruption flow in streaming_query.py to delegate to stream_interrupts helpers (persist_interrupted_turn, interrupt callback), which this PR further extends.
lightspeed-core/lightspeed-stack#1919: Touches agent streaming SSE/interruption handling and persistence in utils/agents/streaming.py, the same file updated here.

Suggested reviewers

tisnik
jrobertboos

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing persistence of interrupted conversations, which is the core feature of this PR.
Docstring Coverage	✅ Passed	Docstring coverage is 89.09% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 637-640: The build_interrupted_response call at line 637 relies
solely on turn_summary.partial_tokens, which may be empty or incomplete if
cancellation occurs after response.output_text.done has populated
turn_summary.llm_response but before all deltas are processed. Modify the
build_interrupted_response call to use turn_summary.llm_response as a fallback
when partial_tokens is empty, ensuring that model output is not lost when
interrupted responses are reconstructed and persisted.

In `@src/utils/markdown_repair.py`:
- Around line 75-90: In the fence closing logic (the elif condition checking
`char == fence_char and len(matched_group) >= fence_len`), add validation to
ensure that any trailing content after the fence marker contains only whitespace
characters (spaces and tabs). Extract the remainder of the line after the
matched fence group and check that it either doesn't exist or contains only
whitespace using a string method like strip() or a regex check. Only allow the
fence to close if this whitespace validation passes, otherwise treat the line as
regular content inside the code block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 093e2b47-86b6-41de-a11e-9ff652c597b9

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and 6aeea11.

📒 Files selected for processing (10)

src/app/endpoints/streaming_query.py
src/constants.py
src/models/common/turn_summary.py
src/utils/agents/streaming.py
src/utils/markdown_repair.py
src/utils/stream_interrupts.py
tests/unit/app/endpoints/test_streaming_query.py
tests/unit/utils/agents/test_streaming.py
tests/unit/utils/test_markdown_repair.py
tests/unit/utils/test_stream_interrupts.py

📜 Review details

⏰ Context from checks skipped due to timeout. (12)

GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: build-pr
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: integration_tests (3.13)
GitHub Check: integration_tests (3.12)
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py