test: unify a shared flaky-rerun marker; fix nwis collection-time call#329
Merged
Merged
Conversation
The Ubuntu CI on main (run 27969553989, on the DOI-USGS#326 merge) failed: a transient 503 from the live legacy NWIS site service hit the live tests in nwis_test.py. Windows passed by timing. Not a regression from DOI-USGS#326 — the live tests and the offending call date to DOI-USGS#62. Two things made the transient fatal: - nwis_test.py had no flaky-rerun marker (waterdata/ngwmn got one in DOI-USGS#325), and - TestTZ fetched its sites in the class body (`sites, _ = what_sites(...)`), i.e. at collection time, where a 503 aborts the whole module and can never be reran (rerunfailures retries failed tests, not collection errors). Changes: - Add one shared marker, `conftest.flaky_api`, with a unified transient pattern list, and apply it to every live suite: nwis_test.py / waterdata_test.py / ngwmn_test.py (module-level) and utils_test.py::Test_query (class-level — the only other live suite, found by auditing each module behind a dead proxy). Replaces three drifted inline copies. - The status pattern now matches both the OGC path's "<status>:" and the legacy query path's "HTTP <status>" message shapes, so one list serves all four. This closes a latent gap: ngwmn's old marker omitted ServiceUnavailable and the chunked QuotaExhausted/ServiceInterrupted wrappers (DOI-USGS#325 fixed waterdata but missed ngwmn), so a 503 in an ngwmn live test would have failed CI too. - Move TestTZ's class-body fetch into a class-scoped `sites` fixture so it runs at test time, where the marker can retry a transient. Everything else is mocked, module-skipped (nadp), or — the chunking ..._on_real_transport test — a deterministic local http.server, not external. Verified: nwis collection is network-free; the markers attach with the unified config; ngwmn now covers ServiceUnavailable/ServiceInterrupted; live Test_query and TestTZ pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sjb14HkwuCydKSKMsaXsgd
095b964 to
704f734
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Python package CI on
mainfailed (run 27969553989, on the #326 merge):lint/type-checkand all Windows test jobs passed, but all three Ubuntu jobs failed with the same cause —A transient 503 from the live legacy NWIS service hit the live tests in
tests/nwis_test.py. Windows passed only by timing. Not a regression from #326 — the live tests and the offending call date to #62.Two things made a transient outage fatal instead of self-healing:
waterdata_test.py/ngwmn_test.pyretry transient 429/5xx viapytest.mark.flaky(test(waterdata): rerun flaky transient 5xx/429 from the chunked fan-out #325);nwis_test.pyhad none.TestTZfetched its sites in the class body (sites, _ = what_sites(stateCd="MD")), so the 503 raised during collection (exit 2,Interrupted: 1 error during collectionon 3.13).pytest-rerunfailuresreruns failed tests, never collection errors.What this does
Audited every test module for live-API calls (ran each behind a dead proxy: genuine calls fail, mocked ones pass). Then unified the rerun handling instead of adding a fourth inline copy:
conftest.flaky_api, with a single transient-pattern list. Applied to every live suite:nwis_test.py,waterdata_test.py,ngwmn_test.py— module-levelpytestmark = flaky_api.utils_test.py::Test_query— class-level@flaky_api(the only other live suite the audit found; the rest of that module is mocked)."<status>:"and the legacyquerypath's"HTTP <status>"— so one list serves all four.ngwmn's old marker omittedServiceUnavailableand the chunkedQuotaExhausted/ServiceInterruptedwrappers (test(waterdata): rerun flaky transient 5xx/429 from the chunked fan-out #325 added those towaterdatabut missedngwmn), so a 503 in an NGWMN live test would have failed CI too. It now inherits the full set.TestTZ's class-body fetch moves into a class-scopedsitesfixture, so it runs at test time where the marker can retry it.Everything else is mocked, module-skipped (
nadp), or — the chunking..._on_real_transporttest — a deterministic localhttp.server, not an external service.Verification
nwis_test.pycollection is now network-free (--collect-onlymakes no live call).ngwmnnow coversServiceUnavailable/ServiceInterrupted.ServiceUnavailable: HTTP 503and... 503:but not a deterministicAssertionError.Test_query+TestTZpass; mocked suites unaffected.🤖 Generated with Claude Code