Skip to content

[v2-rebuild] databricks-pack — community design review (6 skills, feedback wanted before code) #795

@jeremylongshore

Description

@jeremylongshore

Context

The databricks-pack plugin in this marketplace is being rebuilt from the ground up. The v1 was thin documentation; v2 is a working plugin pack with live workspace access — six skills, one shared MCP server, real detection logic against real workloads.

Every design decision is on the table here before code lands. I have catalogued 34 production failure modes from secondary research (Reddit, GitHub issues, Databricks community forum, vendor docs) and ranked them into six skills. The research is solid for what gets formally reported. It is not solid for what gets cursed at silently — and that is the input I am asking this community to provide.

The honest stance: I can build any of this. I am not the one running production Databricks workloads. If you tell me which failure modes you would actually want this to catch, which detection thresholds will fire correctly at 2 AM versus on every healthy workspace, and which skill designs senior data engineers will actually reach for, I will build what you say — not what I would guess. Implementation is not the rate-limiting input; calibrated priority signal is.

The six skills

# Skill What it catches Issue Status
1 databricks-workspace-mcp Typed control-plane access (clusters.events, instance pools, pipeline event logs, external locations, storage credentials) #789 🟡 design open
2 databricks-cost-leak-hunter Ranked real-dollar cost leaks from system.billing.usage + live workspace state (pilot — start here) #790 🟡 design open
3 databricks-cluster-forensics Cold starts, launch failures, Photon fallback, DBR-upgrade triage from live cluster events #791 🟡 design open
4 databricks-streaming-guardian Delta + Liquid Clustering + Structured Streaming + DLT operations (the largest skill, twelve failure modes) #792 🟡 design open
5 databricks-uc-migration-pilot Unity Catalog readiness, IAM/SCIM diagnostics, system-table access tracing (Sept 30 2026 deadline driver) #793 🟡 design open
6 databricks-bundle-medic Asset Bundles deploy diagnostics, CMK rotation, PrivateLink endpoint audit #794 🟡 design open

Start here if you only have 10 minutes

Read #790 — databricks-cost-leak-hunter first. It is the pilot skill — the first thing in the rebuilt pack a user will actually run, designed to be FinOps-grokkable on the first read. The four cost categories it catches are the ones I would most want validated against your real workloads, and the design questions inside are the ones I am most uncertain about. If you only leave thumbs-up / thumbs-down on the question bullets there, you have already calibrated the most important skill in the pack.

How to engage

  • Comment on any issue with thoughts, corrections, or alternative approaches.
  • Thumbs-up / thumbs-down on individual design-question bullets is equally valuable — that is pure priority signal and takes 30 seconds.
  • Voice memo via WhatsApp works for anyone in Luciano's community who would prefer to speak instead of type in English — I will transcribe back into the relevant issue with credit. Portuguese is fine.

What this is not

  • Not a code review request. There is no code yet — by design.
  • Not a sales pitch. The pack will ship under the same marketplace license as every other plugin here.
  • Not a request for unpaid consulting hours. Any depth of engagement is fine; no engagement is also fine.

Why public

Every catalogued failure mode is already public (forum threads, GitHub issues, vendor docs). The implementation is the novelty, not the design — so the design can live in the open and the people who would benefit can shape it before code lands. Full transparency was deliberate; if any of you would have made a different call, please say so in this thread.

Reference material

All design records, pain research, and pressure tests live in plugins/saas-packs/databricks-pack/000-docs/ and are linked individually below. Index file: 000-INDEX.md.

Pain research (RL-RSRC)

The 34 catalogued failure modes, grouped by Databricks domain. These are the inputs that drove which skills got built and at what threshold.

Doc What it covers
002-RL-RSRC Domain 1 — Compute / Photon / DBR versioning / cost
003-RL-RSRC Domain 2 — Delta Lake, Liquid Clustering, Structured Streaming, DLT
004-RL-RSRC Domain 3 — Unity Catalog, Asset Bundles, identity, workspace ops, secrets, networking
005-RL-RSRC Architectural patterns extracted from Anthropic-published Claude Code skills (2026)
006-RL-RSRC v2-rebuild synthesis — failure modes → six skills mapping
010-RL-RSRC Databricks MCP landscape (official, managed)
011-RL-RSRC Databricks MCP community landscape (2026)
012-RL-RSRC Claude Code / Cowork ↔ Databricks MCP integration architecture

Architecture decisions (AT-ADEC)

Locked design decisions with reasoning. These are the load-bearing records — if you disagree, this is where to push back.

Doc What it locks
007-AT-ADEC CTO decision — Databricks Pack v2 rebuild (REVISED)
013-AT-ADEC Epic 1 MCP scope adjustment — auth flows, dual-MCP design, scope cut from 8 → 6 endpoints

Pressure tests (RA-REVW)

Adversarial validation of the decisions above.

Doc What it stress-tests
008-RA-REVW Pack-handling pressure test — version bump in place vs. fresh pack
009-RA-REVW Pilot-timing pressure test — databricks-cost-leak-hunter as pilot, 3-5 day window

Direct reading is welcome but not required. The issue bodies summarize what is needed for response.

  • Jeremy Longshore
    intentsolutions.io

Internal tracking (beads for agent context — not for external action):

Metadata

Metadata

Assignees

No one assigned

    Labels

    community-design-reviewPractitioner input is the load-bearing input for this issuedatabricks-packScope label for Databricks plugin pack workfeedback-wantedIssue is soliciting community feedback before code landsmetaTracking or umbrella issuesaas-packsSaaS integration packs under plugins/saas-packs/v2-rebuildTemporal marker for v2 rebuild initiatives (sunsetable)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions