ci: add DBR LTS install check to catch ES-1960554-class regressions#843
Open
vikrantpuppala wants to merge 1 commit into
Open
ci: add DBR LTS install check to catch ES-1960554-class regressions#843vikrantpuppala wants to merge 1 commit into
vikrantpuppala wants to merge 1 commit into
Conversation
5040862 to
f4fd300
Compare
gopalldb
approved these changes
Jul 2, 2026
1e96f3a to
0b1ad22
Compare
0b1ad22 to
f125819
Compare
…554) The thrift 0.23.0 bump (PR #796, shipped in 4.2.7) broke `pip install` on DBR LTS: thrift ships sdist-only and 0.23.0's setup.py calls sys.exit(0) on the build-success path, killing the PEP 517 backend before pip writes output.json. On the old setuptools shipped by DBR 14.3/15.4 LTS this is a hard install failure (SEV0 ES-1960554); 4.2.7 was yanked and reverted (#840). Our CI never caught it because every job installs via `poetry install` on a modern runner -- it never does a fresh `pip install` of the built wheel on an LTS toolchain, the real customer path that failed. CI check -------- Adds a PR check (gated to dependency changes) that builds the wheel and installs it INSIDE real DBR LTS clusters via the PECO workspace Jobs API (no PyPI publish) then runs a SELECT 1 smoke test. Matrix = supported LTS {13.3, 14.3, 15.4, 16.4, 17.3} x install target {base, pyarrow, kernel}. Auth is OAuth M2M as the PECO service principal throughout (driver -> workspace API and the notebook's connector -> warehouse smoke query); a PAT is warehouse-scoped and rejected by the workspace REST API. Older LTS ship an SDK too old for auth_type=oauth-m2m, so the smoke harness upgrades databricks-sdk. Per-run artifacts are cleaned up in a finally block. Connector fix (caught by the check) ----------------------------------- The check surfaced a real latent bug: a base install (no [pyarrow] extra) runs against a runtime's bundled pyarrow, and on DBR 13.3/14.3 that pyarrow predates the `promote_options` kwarg, so concat_table_chunks raised `TypeError: concat_tables() got an unexpected keyword argument 'promote_options'` on the Arrow result path. utils.py now falls back to the legacy `promote=True` (equivalent to promote_options="default") when the kwarg is unsupported, with a regression test. Validated end-to-end against the PECO workspace: green on thrift 0.22.0, and re-widening the pin to <0.24.0 fails on 14.3+15.4 with the exact output.json error -- a true guard, not a check that always passes. Also adds an incident-linked comment on the thrift pin so nobody re-widens it before the upstream fix (THRIFT-6067 / apache/thrift#3584) ships. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a DBR LTS Install CI check that builds the connector wheel and installs it inside real DBR LTS clusters (via the PECO workspace Jobs API — no PyPI publish needed), then runs a
SELECT 1smoke test. Matrix = supported LTS {13.3, 14.3, 15.4, 16.4, 17.3} × install target {base, pyarrow, kernel}.Also adds an incident-linked comment on the
thriftpin inpyproject.tomlso nobody re-widens it before the upstream fix ships.Why
The thrift 0.23.0 bump (PR #796, shipped in 4.2.7) broke
pip installon DBR LTS — SEV0 ES-1960554. thrift ships sdist-only, and 0.23.0'ssetup.pycallssys.exit(0)on the build-success path, killing the PEP 517 backend before pip writesoutput.json. On the old setuptools shipped by DBR 14.3/15.4 LTS this is a hard install failure. 4.2.7 was yanked and the bump reverted (PR #840).Our CI never caught it because every job installs via
poetry installon a modern runner — it never does a freshpip installof the built wheel on an LTS toolchain, which is the real customer path that failed. This PR closes exactly that gap.How it works
Per matrix leg,
scripts/dbr_lts_install_check.py(driver, runs on the GH runner):scripts/dbr_lts_smoke_notebook.pyinto the workspace,spark_version; the notebookpip installs the wheel (+ extras) and runsSELECT 1,finally(every exit path).Several non-obvious DBR-cluster constraints are baked in and commented (notebook_task not spark_python_task-from-Volume; SINGLE_USER access mode for UC/Volume access;
dbutils.fs.cpthe wheel off/Volumes;dbutils.library.restartPython()after install).Gating
Runs on
pull_request, but the cluster matrix runs only when dependency-affecting files change (pyproject.toml/poetry.lock/ this workflow / the two scripts) — the only surface that can introduce this failure class. Informational check (not a required merge-queue gate). Uses only secrets already present in theazure-prodenvironment (DATABRICKS_HOST,DATABRICKS_TOKEN,TEST_PECO_WAREHOUSE_HTTP_PATH); the notebook-import dir is derived from the token's own identity.Validation
Exercised end-to-end against the PECO workspace:
SELECT 1pass on 15.4 for bothbaseand[pyarrow].<0.24.0fails on 14.3 and 15.4 with the exact incident error (Downloading thrift-0.23.0.tar.gz→OSError ... output.json) — a true guard, not a check that always passes.Follow-ups (external / not in this PR)
azure-prodDATABRICKS_TOKENcan create job clusters and write to thepeco.default.ci_wheelsVolume.This pull request and its description were written by Isaac.