Skip to content

feat: improve OpenAI error handling and surfacing#178

Open
HardeepAsrani wants to merge 3 commits into
developmentfrom
feat/openai-error-handling
Open

feat: improve OpenAI error handling and surfacing#178
HardeepAsrani wants to merge 3 commits into
developmentfrom
feat/openai-error-handling

Conversation

@HardeepAsrani

@HardeepAsrani HardeepAsrani commented Jun 30, 2026

Copy link
Copy Markdown
Member

Summary

Improves how the plugin handles and surfaces OpenAI errors, addressing three support-driven issues:

  • Codeinwp/hyve#149 — a free/no-credit key no longer validates as "good" then fails silently later.
  • Codeinwp/hyve#199 — knowledge-base indexing failures are now visible instead of failing silently.
  • Codeinwp/hyve#200 — admins get a clear dashboard notice when something requiring their action breaks the bot.

What changed

Key validation (#149). Validation now hits the embeddings endpoint (the capability the plugin actually uses), so a key is accepted only when it's genuinely usable — valid auth and available credits. Real-world finding: a brand-new unfunded account returns 429 even on the free moderation endpoint (with a null error code), so the old moderation-based check was both wrong and unhelpful. Embeddings returns a clean insufficient_quota. Invalid keys are blocked; account-level problems (no credits, rate limit) save the key and warn rather than blocking, so the user can proceed once they add credits.

Actionable messages (#149/#199). A single source of truth maps OpenAI error codes to actionable, translatable copy (e.g. "…no available credits… add billing or upgrade to a paid plan"), consumed by both REST responses and the dashboard notice. Error-code lists are centralised into OpenAI::AUTH_ERROR_CODES / OpenAI::PERSISTED_ERROR_CODES constants so they can't drift.

KB indexing failures (#199). process_post() previously swallowed embedding/Qdrant failures, retried forever, and add_post() still reported success. Failures are now classified: fatal errors (bad key, no credits, billing) stop immediately and mark the entry failed; transient errors (rate limit, network) retry with backoff up to a cap (5 attempts) then give up. The reason is surfaced both as an immediate warning toast at add time and as a status badge in the data list -- "…retried automatically" vs "…fix the problem and re-add" -- and clears on a successful attempt.

Dashboard notice (#200). The service-error notice is gated to the last 24 hours, cleared on the next successful request, excludes transient rate-limits, and now updates without a page reload (store-backed ErrorSection + an apiFetch middleware that syncs from any /settings response). The notice is reconciled with the saved key only after the save lands, so it can never reflect a key that wasn't stored.

Notes

  • Companion pro PR (Advanced-panel warning handling): Codeinwp/hyve#230

Test plan

  1. Save a valid funded key → "Settings saved", no warning/notice.
  2. Save a no-credit key → key saves, amber warning + dashboard notice appear immediately (no refresh).
  3. Save an invalid key → blocked, no notice, key not stored.
  4. With a no-credit key, add KB content → "Indexing failed… will retry" badge; add credits + reprocess → badge clears.
  5. Notice clears on next successful save; disappears after 24h.

⚠️ Ingestion pipeline refactor — behavior change across ALL data sources

The latest commit reworks how content is added to the Knowledge Base, not just error handling. Previously each of the four sources — Posts, Custom Data, Site Crawl, Sitemap — re-implemented the same tokenize → moderate → insert → embed flow with subtle differences. They now all funnel through a single method, DB_Table::ingest_document(). (Pro callers move over in Codeinwp/hyve#230.)

Because this touches every ingestion path, QA must re-test all four data sources end to end — this is a regression check on the whole "add to KB" surface, not only the OpenAI-error cases in the test plan above.

Also note one intentional behavior change: Posts "Add" is now fully synchronous — a failed embedding is surfaced immediately (warning toast) and is no longer retried in the background. Posts "Update" still retries via cron.

QA — confirm each source works as before (no regression)

For each of Posts, Custom Data, Site Crawl, Sitemap:

  • Add content → it appears in the Knowledge Base, status reaches indexed, and the chatbot can answer from it.
  • Update / edit → the change is re-indexed (Posts: on save via cron; Custom Data & Site Crawl: immediately; Sitemap: delete + re-add).
  • Delete → it's removed from the list and from the vector store.
  • Multi-chunk content (paste/import something >~1000 tokens) → adds, updates and deletes as a single document (not split into several entries), and all chunks index.
  • Moderation-flagged content → still blocked / shows the review as before.
  • With a no-credit or invalid key → the failure surfaces (toast + badge) and nothing is half-saved.

Run the above with both storage backends: default WordPress storage and Qdrant.

If anything regresses, this is isolated in its own commit on top of the error-handling work, so it can be reverted independently.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

Plugin build for 6005ae5 is ready 🛎️!

Validate API keys against the embeddings endpoint so a key is accepted
only when it is genuinely usable (valid auth and available credits).
Invalid keys are blocked, but account-level problems (no credits, rate
limits) save the key and warn instead of blocking — new, unfunded
accounts return a 429 even on the free moderation endpoint, so blocking
on validation was wrong.

Map OpenAI error codes to actionable, translatable messages from a
single source of truth (centralised code-list constants), surface
knowledge base indexing failures in the data UI with automatic retry,
and show a dashboard service-error notice that is gated to the last 24
hours, cleared on the next successful request, and updates without a
page reload via an apiFetch middleware.

Refs Codeinwp/hyve#149, Codeinwp/hyve#199, Codeinwp/hyve#200
Add DB_Table::ingest_document() as the shared tokenize -> moderate -> resolve
post -> insert chunks -> embed path used by every data source; add_post()
becomes a thin wrapper over it.

process_post() now returns its result and accepts an $allow_retry flag so a
caller can opt out of the background hyve_process_post retry. Posts add runs
synchronously (retry_async=false): a failed embedding is surfaced immediately
instead of being retried in the background.
Qdrant only reacted to 403, so a deleted or paused cluster (404) failed
silently on every chat while the Integrations page still showed it as
connected. Route all Qdrant exceptions through a single handler that persists
the error so it surfaces in the dashboard notice, and mark the connection
inactive on 401/403/404 so the UI no longer reports a connection that no
longer works. Add per-code messages so admins see the actual cause.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants