LCORE-2336: Unified llama_stack.config schema + synthesizer by max-svistunov · Pull Request #1963 · lightspeed-core/lightspeed-stack

max-svistunov · 2026-06-22T07:26:07Z

Description

Implements LCORE-2336, the first implementation ticket of the unified
lightspeed-stack.yaml feature (LCORE-836 spike). It collapses the two-file
configuration (lightspeed-stack.yaml + an external Llama Stack run.yaml)
into a single file: operators describe inference providers at a high level and
LCORE synthesizes the full run.yaml at startup. run.yaml becomes an
implementation detail LCORE owns rather than an operator-facing artifact.

What this PR adds:

Schema (src/models/config.py): UnifiedInferenceProvider,
InferenceConfiguration.providers (a root-level, backend-agnostic synthesis
input per Decision S5), UnifiedLlamaStackConfig + LlamaStackConfiguration.config
(baseline / profile / native_override). The nested check_llama_stack_model
validator now accepts a config block in lieu of library_client_config_path;
a new root check_unified_vs_legacy validator enforces mutual exclusion
between unified synthesis inputs and the legacy path, pointing operators at
--migrate-config.
Synthesizer (src/llama_stack_configuration.py): synthesize_configuration
(baseline -> enrichment -> high-level inference -> native_override deep-merge),
deep_merge_list_replace, apply_high_level_inference (with PROVIDER_TYPE_MAP
and the sentence_transformers -> sentence-transformers provider_id fix from
the PoC), load_default_baseline, and synthesize_to_file (writes mode 0600).
Shipped baseline (src/data/default_run.yaml): a thinner default whose
external_providers_dir carries a default so it resolves when
EXTERNAL_PROVIDERS_DIR is unset. src/data is a package so the file ships
in built wheels (verified).
Wiring: library-mode synthesis in src/client.py; a
--synthesized-config-output flag in src/lightspeed_stack.py propagated to
the uvicorn workers; shared constants; .gitignore for ./.generated/.

Out of scope (separate tickets): migration tool (LCORE-2337), container/deploy
artifacts (LCORE-2338), deprecation WARN (LCORE-2339), e2e (LCORE-2341/2343),
docs and examples (LCORE-2345/2346).

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude Opus 4.8
Generated by: Claude Opus 4.8

Related Tickets & Documents

Related Issue # LCORE-2336
Closes # LCORE-2336

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Manual verification (library mode, end-to-end)

Create a unified lightspeed-stack.yaml with no external run.yaml. The
minimal form needs only a top-level inference.providers list — no
llama_stack.config block:
```
llama_stack:
  use_as_library_client: true
inference:
  default_model: gpt-4o-mini
  default_provider: openai
  providers:
    - type: openai
      api_key_env: OPENAI_API_KEY
      allowed_models: [gpt-4o-mini]
    - type: sentence_transformers
```
(Add an optional llama_stack.config block with baseline / profile /
native_override when you need the escape hatch — verified separately.)
export OPENAI_API_KEY=<key> and start library mode:
uv run lightspeed-stack -c lightspeed-stack-unified.yaml

Confirm the service boots and serves:

$ curl -s localhost:8080/liveness
{"alive":true}
$ curl -s localhost:8080/readiness
{"ready":true,"reason":"All providers are healthy","providers":[]}
$ curl -s -X POST localhost:8080/v1/query -H 'Content-Type: application/json' \
    -d '{"query": "Name three primary colors. One sentence."}'
{"response":"The three primary colors are red, blue, and yellow.", ...}

Confirm the synthesized file (R6 / R10):

$ ls -l ./.generated/run.yaml
-rw-------. 1 user user 2545 ... ./.generated/run.yaml      # mode 0600
$ grep -E 'provider_id:|api_key:' ./.generated/run.yaml
      api_key: ${env.OPENAI_API_KEY}                        # env ref, never resolved
      provider_id: openai
    - provider_id: sentence-transformers                    # hyphenated id

Tests specific to this change

uv run python -m pytest tests/unit/test_llama_stack_synthesize.py \
    tests/unit/models/config/test_llama_stack_configuration.py \
    tests/unit/test_client.py -q
# 55 passed

Full unit suite (Pydantic-config change)

uv run python -m pytest tests/unit/ -q
# 2714 passed, 1 skipped

uv run make verify passes for all changed files (black, pylint 10.00/10,
ruff, pyright, pydocstyle). NOTE: make verify currently also fails on a
pre-existing, unrelated mypy issue in
tests/integration/container_lifecycle/test_container_lifecycle.py (missing
type annotations on the current main), untouched by this PR.

Summary by CodeRabbit

New Features
- Unified-mode configuration synthesis: configure inference providers and Llama Stack settings using a simplified, backend-agnostic format.
- High-level inference provider definitions: specify providers with type, API keys, and allowed models.
- CLI option --synthesized-config-output to customize the output path for synthesized configurations.
- Built-in baseline configuration for unified-mode synthesis.
Documentation
- Updated API schema to reflect unified-mode configuration options and new provider definitions.

…a_stack.config) Introduce the Pydantic schema for unified-mode Llama Stack configuration, the first piece of the single-file lightspeed-stack.yaml feature (LCORE-836 spike). Operators describe inference providers at a high level and let LCORE synthesize the Llama Stack run.yaml, instead of maintaining a separate run.yaml file. - UnifiedInferenceProvider: a backend-agnostic inference provider entry (type, api_key_env, allowed_models, extra). - InferenceConfiguration.providers: the high-level synthesis input (Decision S5 - lives at the configuration root so it survives a future backend change); default_model/default_provider keep their query-time routing meaning and are independent of this list. - UnifiedLlamaStackConfig + LlamaStackConfiguration.config: backend-specific knobs (baseline, profile, native_override). Mode detection lives entirely on the root Configuration model (check_unified_vs_legacy), since the deciding inputs span the root (inference.providers) and the nested llama_stack block: - A synthesis input (a non-empty inference.providers or a llama_stack.config block) and the legacy library_client_config_path are mutually exclusive, with an error pointing operators at --migrate-config. - Library mode requires some run source; inference.providers alone is sufficient, so no llama_stack.config block is required for the minimal unified config. The nested LlamaStackConfiguration no longer rejects a pathless library config (it cannot see inference.providers); it only file-checks a legacy path. Legacy behavior is otherwise unchanged. Regenerate docs/openapi.json for the new schemas and fields. Add unit tests for unified-mode validation, mutual exclusion, the inference.providers-only shape, and the library-mode-without-run-source error; update the dump-config expected dicts for the new config/providers fields.

Add the synthesis pipeline that builds a complete Llama Stack run.yaml from a unified lightspeed-stack.yaml, so run.yaml becomes an implementation detail LCORE owns rather than an operator-facing artifact. - synthesize_configuration: baseline selection (built-in default, a profile file, or empty) -> existing enrichment (Azure Entra ID, BYOK RAG, Solr/OKP, for parity with legacy mode) -> high-level inference expansion -> a raw native_override deep-merged last. - deep_merge_list_replace: maps merge recursively, lists and scalars replace wholesale (Decision T2). - apply_high_level_inference: maps each provider type to a Llama Stack provider_type via PROVIDER_TYPE_MAP, hyphenates the emitted provider_id so inline embedders match the baseline convention (sentence_transformers -> sentence-transformers), replaces a baseline entry of the same id rather than duplicating it, and emits ${env.<VAR>} references for secrets. - synthesize_to_file: writes the result with mode 0600, re-chmodded on each boot, since native_override may carry literal secrets. - load_default_baseline + src/data/default_run.yaml: a thinner shipped baseline whose external_providers_dir carries a default so it resolves when EXTERNAL_PROVIDERS_DIR is unset. src/data is a package so the file ships in built wheels. Add unit tests covering merge semantics, inference expansion, the full pipeline, profile resolution, and the 0600 write.

Drive the synthesizer from the running service and let operators inspect or relocate its output. - client.py: _load_library_client treats a library_client_config_path with no unified config block as legacy (enrich the operator-supplied run.yaml) and every other library-mode shape as unified (synthesize a fresh run.yaml to a persistent path, overwritten each boot at mode 0600). Because the root Configuration validator guarantees a run source and forbids a path coexisting with unified inputs, a config driven only by the root-level inference.providers (no config block) correctly falls through to synthesis, and the client needs no access to the root configuration. - lightspeed_stack.py: add --synthesized-config-output, propagated to the uvicorn workers (separate processes that perform the synthesis) via an environment variable, mirroring how the config path is already propagated. - constants.py: shared env-var names and the default synthesized-config path (./.generated/run.yaml). - .gitignore: ignore the generated run.yaml directory.

coderabbitai · 2026-06-22T07:26:21Z

Walkthrough

Adds a unified-mode Llama Stack configuration synthesis pipeline. New UnifiedInferenceProvider and UnifiedLlamaStackConfig Pydantic models capture backend-agnostic provider entries and synthesis controls. A synthesis pipeline in llama_stack_configuration.py builds a complete run.yaml from a lightspeed-stack.yaml input, with a shipped default baseline in src/data/default_run.yaml. The client and CLI are updated to route between legacy enrichment and unified synthesis paths.

Changes

Unified-mode Llama Stack config synthesis

Layer / File(s)	Summary
Unified-mode data contracts and constants `src/constants.py`, `src/models/config.py`, `docs/openapi.json`	Adds `UnifiedInferenceProvider` and `UnifiedLlamaStackConfig` models; adds `LlamaStackConfiguration.config` and `InferenceConfiguration.providers` fields; adds `check_unified_vs_legacy` root validator enforcing mutual exclusivity with legacy path; updates `check_llama_stack_model` to allow unified-mode inputs; introduces three env-var name and default-path constants; reflects all new schemas in OpenAPI docs.
Built-in default baseline YAML `src/data/__init__.py`, `src/data/default_run.yaml`	Adds the shipped baseline `run.yaml` used when `baseline == "default"`, defining apis, providers (inference/embedding/files/safety/tool_runtime/vector_io/agents), storage backends, registered resources, vector_stores, and default safety shield.
Synthesis pipeline `src/llama_stack_configuration.py`	Adds `PROVIDER_TYPE_MAP`, `load_default_baseline()`, `deep_merge_list_replace()`, `apply_high_level_inference()`, `_resolve_profile_path()`, `synthesize_configuration()`, and `synthesize_to_file()` implementing the full synthesis flow with mode-0600 file writes.
Client and CLI wiring `src/client.py`, `src/lightspeed_stack.py`, `.gitignore`	Branches `_load_library_client` between legacy enrichment and unified synthesis via `_synthesize_library_config`; extends the CLI with `--synthesized-config-output`; propagates `SYNTHESIZED_CONFIG_PATH_ENV_VAR` to uvicorn workers; adds `.generated/` to `.gitignore`.
Tests `tests/unit/test_llama_stack_synthesize.py`, `tests/unit/models/config/test_llama_stack_configuration.py`, `tests/unit/models/config/test_dump_configuration.py`, `tests/unit/test_client.py`	Adds synthesis pipeline unit tests covering baseline loading, deep-merge semantics, inference expansion, full synthesis composition, and 0600 file-write/overwrite permissions; updates config model tests for unified-mode validation; updates dump tests for new default-null/empty-list fields; updates client test for synthesis error path.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as lightspeed_stack CLI
  participant Client as _load_library_client
  participant Synth as _synthesize_library_config
  participant Pipeline as synthesize_configuration
  participant FS as File System (run.yaml, mode 0600)
  participant LlamaStack as AsyncLlamaStackAsLibraryClient

  CLI->>CLI: set CONFIG_PATH_ENV_VAR, optionally SYNTHESIZED_CONFIG_PATH_ENV_VAR
  Client->>Client: check library_client_config_path and config.config
  alt legacy mode (library_client_config_path set, config.config is None)
    Client->>Client: enrich legacy config path
  else unified mode
    Client->>Synth: invoke _synthesize_library_config
    Synth->>Synth: read operator YAML from CONFIG_PATH_ENV_VAR
    Synth->>Pipeline: synthesize_configuration(lcs_config, config_file_dir)
    Pipeline->>Pipeline: load baseline (default/profile/empty)
    Pipeline->>Pipeline: apply_high_level_inference
    Pipeline->>Pipeline: deep_merge_list_replace with native_override
    Pipeline-->>Synth: synthesized config dict
    Synth->>FS: synthesize_to_file (write + chmod 0600)
    Synth-->>Client: output_path
  end
  Client->>LlamaStack: initialize with _config_path

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

lightspeed-core/lightspeed-stack#1580: Implements the same unified-mode lightspeed-stack.yaml → synthesized run.yaml flow including the llama_stack.config / inference.providers models, config synthesis pipeline in src/llama_stack_configuration.py, and client/CLI wiring in src/client.py.

Suggested reviewers

asimurka

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'LCORE-2336: Unified llama_stack.config schema + synthesizer' directly summarizes the main change: introducing unified configuration schema and synthesis logic for the Llama Stack.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/data/default_run.yaml`:
- Around line 102-105: The provider_shield_id value in the shield_id:
llama-guard configuration is incorrectly set to openai/gpt-4o-mini, which is a
chat/generative model rather than a guard model. This prevents the safety shield
from properly gating queries. Update the provider_shield_id field under the
llama-guard shield to reference the actual Llama Guard model identifier such as
meta-llama/Llama-Guard-3-8B instead of the chat model identifier.

In `@src/lightspeed_stack.py`:
- Around line 169-175: The current code only sets the
SYNTHESIZED_CONFIG_PATH_ENV_VAR environment variable when
args.synthesized_config_output is provided, but does not unset it when the flag
is omitted, causing stale values to persist in the environment and preventing
workers from using the intended default fallback behavior. Add an else clause to
the existing if condition (after args.synthesized_config_output is not None
check) that explicitly removes constants.SYNTHESIZED_CONFIG_PATH_ENV_VAR from
os.environ using the pop method with a default value to safely handle cases
where the variable may not be set, ensuring workers always use the correct path
whether the override is provided or not.

In `@src/models/config.py`:
- Around line 708-713: In the Pydantic model in config.py, replace the `Any`
type annotation in the `extra` field (around line 708) and the `native_override`
field (around line 751) with specific type annotations instead of `dict[str,
Any]`. Review the usage of these fields throughout the codebase to determine the
appropriate concrete type for the dictionary values (such as `str`, `int`, or a
union of specific types) and replace `Any` with that specific type to maintain
strong schema typing and comply with the coding guidelines.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 84ac8fd7-6cbb-411f-b460-32eb3664dff5

📥 Commits

Reviewing files that changed from the base of the PR and between ee37367 and eef0738.

📒 Files selected for processing (13)

.gitignore
docs/openapi.json
src/client.py
src/constants.py
src/data/__init__.py
src/data/default_run.yaml
src/lightspeed_stack.py
src/llama_stack_configuration.py
src/models/config.py
tests/unit/models/config/test_dump_configuration.py
tests/unit/models/config/test_llama_stack_configuration.py
tests/unit/test_client.py
tests/unit/test_llama_stack_synthesize.py

📜 Review details

⏰ Context from checks skipped due to timeout. (6)

GitHub Check: integration_tests (3.13)
GitHub Check: ruff
GitHub Check: unit_tests (3.13)
GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-0-6-on-pull-request

🧰 Additional context used

📓 Path-based instructions (5)

src/**/*.py