Skip to content

RHIDP-12952: persist interrupted conversation#1971

Open
Jdubrick wants to merge 6 commits into
lightspeed-core:mainfrom
Jdubrick:interrupt-message-persistence
Open

RHIDP-12952: persist interrupted conversation#1971
Jdubrick wants to merge 6 commits into
lightspeed-core:mainfrom
Jdubrick:interrupt-message-persistence

Conversation

@Jdubrick

@Jdubrick Jdubrick commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Description

  • When query interruption was added initially it replaced the entire conversation portion that was interrupted with the interrupt message. This change allows the half-completed message to remain after fixing any breaking code fences/html/tables/etc.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude (Cursor)
  • Generated by: Claude (Cursor)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • New Features

    • Interrupted streaming responses now preserve already-generated partial text and automatically repair incomplete Markdown or HTML formatting before displaying an interruption notice.
  • Bug Fixes

    • Improved handling of truncated content formatting when streams are interrupted.
  • Updates

    • Updated interruption message to "Response stopped by the user."

Jdubrick added 5 commits June 22, 2026 09:50
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@Jdubrick, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 37 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0ff64f7d-24cd-4bb2-8fdd-564a08992d61

📥 Commits

Reviewing files that changed from the base of the PR and between 6aeea11 and 969f5d6.

📒 Files selected for processing (2)
  • src/utils/markdown_repair.py
  • tests/unit/utils/test_markdown_repair.py

Walkthrough

The PR replaces the static interrupted-response message with a mechanism that accumulates streamed token deltas into TurnSummary.partial_tokens, tracks the next SSE chunk index in TurnSummary.next_chunk_id, repairs any unclosed Markdown via a new close_open_markdown utility, and emits the resulting suffix as a final token event on asyncio.CancelledError in both the agent and deprecated endpoint streaming paths.

Changes

Structured Interrupted Response from Partial Tokens

Layer / File(s) Summary
TurnSummary fields and Markdown repair utility
src/models/common/turn_summary.py, src/utils/markdown_repair.py, tests/unit/utils/test_markdown_repair.py
TurnSummary gains partial_tokens: list[str] and next_chunk_id: int. New markdown_repair.py exposes close_open_markdown(text), which returns the minimal suffix to close any open code fence and block-level HTML tags; fully tested.
build_interrupted_response and persistence
src/constants.py, src/utils/stream_interrupts.py, tests/unit/utils/test_stream_interrupts.py
build_interrupted_response(partial_tokens) joins tokens, repairs Markdown, and returns (full_text, suffix). _on_interrupt and persist_interrupted_turn now use turn_summary.llm_response (set to full_text) instead of the static constant. INTERRUPTED_RESPONSE_MESSAGE wording updated.
Agent streaming: accumulation and cancellation suffix
src/utils/agents/streaming.py, tests/unit/utils/agents/test_streaming.py
_process_token appends token text to turn_summary.partial_tokens and sets turn_summary.next_chunk_id; AgentRunResultEvent handler also sets next_chunk_id. On CancelledError, generate_agent_response calls build_interrupted_response, persists with full_text, and emits a TokenStreamPayload carrying suffix at next_chunk_id. New tests validate monotonic chunk IDs and zero-token cancellation.
Endpoint cancellation path and delta tracking
src/app/endpoints/streaming_query.py, tests/unit/app/endpoints/test_streaming_query.py
generate_response cancellation handler replaced to call build_interrupted_response and emit LLM_TOKEN_EVENT with the suffix. response_generator sets turn_summary.next_chunk_id at kickoff, each delta, and completion; deltas are also appended to turn_summary.partial_tokens.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant generate_response / generate_agent_response
  participant _process_token / response_generator
  participant build_interrupted_response
  participant close_open_markdown
  participant persist_interrupted_turn

  rect rgba(100, 149, 237, 0.5)
    note over _process_token / response_generator: Normal streaming
    _process_token / response_generator-->Client: token SSE events (chunk_id 0, 1, 2…)
    _process_token / response_generator->>generate_response / generate_agent_response: appends partial_tokens, sets next_chunk_id
  end

  rect rgba(220, 100, 100, 0.5)
    note over generate_response / generate_agent_response: asyncio.CancelledError
    generate_response / generate_agent_response->>build_interrupted_response: partial_tokens
    build_interrupted_response->>close_open_markdown: joined text
    close_open_markdown-->>build_interrupted_response: repair suffix
    build_interrupted_response-->>generate_response / generate_agent_response: (full_text, emit_suffix)
    generate_response / generate_agent_response->>persist_interrupted_turn: llm_response = full_text
    generate_response / generate_agent_response-->>Client: final token SSE event (emit_suffix, next_chunk_id)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • lightspeed-core/lightspeed-stack#1870: Modifies the same deprecated generate_response/response_generator utilities in streaming_query.py and their agent stream-event wiring.
  • lightspeed-core/lightspeed-stack#1918: Changes the stream-interruption flow in streaming_query.py to delegate to stream_interrupts helpers (persist_interrupted_turn, interrupt callback), which this PR further extends.
  • lightspeed-core/lightspeed-stack#1919: Touches agent streaming SSE/interruption handling and persistence in utils/agents/streaming.py, the same file updated here.

Suggested reviewers

  • tisnik
  • jrobertboos
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing persistence of interrupted conversations, which is the core feature of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 89.09% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/endpoints/streaming_query.py`:
- Around line 637-640: The build_interrupted_response call at line 637 relies
solely on turn_summary.partial_tokens, which may be empty or incomplete if
cancellation occurs after response.output_text.done has populated
turn_summary.llm_response but before all deltas are processed. Modify the
build_interrupted_response call to use turn_summary.llm_response as a fallback
when partial_tokens is empty, ensuring that model output is not lost when
interrupted responses are reconstructed and persisted.

In `@src/utils/markdown_repair.py`:
- Around line 75-90: In the fence closing logic (the elif condition checking
`char == fence_char and len(matched_group) >= fence_len`), add validation to
ensure that any trailing content after the fence marker contains only whitespace
characters (spaces and tabs). Extract the remainder of the line after the
matched fence group and check that it either doesn't exist or contains only
whitespace using a string method like strip() or a regex check. Only allow the
fence to close if this whitespace validation passes, otherwise treat the line as
regular content inside the code block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 093e2b47-86b6-41de-a11e-9ff652c597b9

📥 Commits

Reviewing files that changed from the base of the PR and between 9ff72ff and 6aeea11.

📒 Files selected for processing (10)
  • src/app/endpoints/streaming_query.py
  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/agents/streaming.py
  • src/utils/markdown_repair.py
  • src/utils/stream_interrupts.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/utils/test_stream_interrupts.py
📜 Review details
⏰ Context from checks skipped due to timeout. (12)
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/constants.py
  • src/models/common/turn_summary.py
  • src/utils/markdown_repair.py
  • src/utils/stream_interrupts.py
  • src/app/endpoints/streaming_query.py
  • src/utils/agents/streaming.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/common/turn_summary.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/utils/test_markdown_repair.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/utils/agents/test_streaming.py
  • tests/unit/utils/test_stream_interrupts.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/streaming_query.py
🧠 Learnings (3)
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/common/turn_summary.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/streaming_query.py
🔇 Additional comments (7)
src/models/common/turn_summary.py (1)

117-126: LGTM!

src/constants.py (1)

15-15: LGTM!

src/utils/stream_interrupts.py (1)

23-23: LGTM!

Also applies to: 219-239, 277-277, 286-286, 368-369

tests/unit/utils/test_stream_interrupts.py (1)

8-21: LGTM!

Also applies to: 49-49, 71-72, 100-100, 169-193

src/utils/agents/streaming.py (1)

28-28: LGTM!

Also applies to: 68-68, 201-217, 358-364, 415-415

tests/unit/utils/agents/test_streaming.py (1)

67-68: LGTM!

Also applies to: 720-722, 813-813, 966-1106

tests/unit/app/endpoints/test_streaming_query.py (1)

54-54: LGTM!

Also applies to: 74-75, 1385-1385, 1394-1400

Comment thread src/app/endpoints/streaming_query.py
Comment thread src/utils/markdown_repair.py
Signed-off-by: Jordan Dubrick <jdubrick@redhat.com>
@Jdubrick

Copy link
Copy Markdown
Contributor Author

/cc @tisnik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant