feat(obs4ref): add CEDA obs_for_ref_v2 reference datasets#778
Open
lewisjared wants to merge 2 commits into
Open
feat(obs4ref): add CEDA obs_for_ref_v2 reference datasets#778lewisjared wants to merge 2 commits into
lewisjared wants to merge 2 commits into
Conversation
Register 33 obs4REF reference-data files covering 12 CEDA obs_for_ref_v2 observational datasets, and drop 8 superseded entries (WECANN-1-0 and NOAA-NCEI/WOA2023 at the old 20250516 version). Each new entry records its originating CEDA download URL as a provenance comment so the upstream source stays recoverable even though the registry path uses the DRS grid label rather than the CEDA resolution string.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR updates the obs4REF dataset registry reference file, replacing entries for several providers (ARCCSS, ColumbiaU WECANN, DWD, NOAA WOA2023, and others) with newer versioned datasets, updated MD5 hashes, and added source comments. A corresponding changelog entry documents the addition of 33 new files across 12 dataset categories. ChangesObs4REF reference registry update
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Registers 33 new obs4REF reference-data files covering 12 CEDA
obs_for_ref_v2observational datasets, and drops 8 superseded registry entries. The corresponding NetCDF files have been uploaded to the obs4ref store (ref-obs4ref-public) and resolve athttps://obs4ref.climate-ref.org/obs4REF/.... Registry entry count goes 58 -> 83.Datasets added / upgraded
LORA-1-0 (mrro), WECANN-1-0 (gpp/hfls/hfss), ESACCI-CLOUD-AVHRR-AMPM-3-0 (12 radiation/cloud variables), GFED-5-0 (burntFractionAll), SAGE-CCI-OMPS (o3, zonal), FLUXNET2015-1-0 (gpp, site), HWSD-2-0 (cSoil), CALIPSO-ICECLOUD-1-0 (cli), WOA-23 under NOAA-NCEO-OCL (no3/o2/po4/si/so/thetao/ohc/ohcJm2), RAPID-2023-1a (msftmz), CCI-CryoClim-FSC-1 (snc), and Hoffman-1-0 (fgco2/nbp).
Superseded entries removed
WECANN-1-0 at the old
20250516version (3 files) replaced byv20250902, and NOAA-NCEI/WOA2023 at20250516(5 files) replaced by the NOAA-NCEO-OCL/WOA-23 source. The old objects remain in the bucket as unreferenced orphans and can be pruned separately.Provenance
Each new entry is preceded by a
# source:comment carrying its originating CEDAdap.ceda.ac.ukdownload URL. These lines are documentary only (pooch skips comment lines); fetches still resolve against the obs4ref base URL, not CEDA. The comment is the only place the CEDA path is recoverable because the registry path uses the DRS grid label (gn/gm/gr/gnz/site) rather than the CEDA resolution string.Verification
All 33 files md5-match the CEDA listing and the repo's own
scripts/create-registry.pyoutput.import climate_refloads the registry as 83 entries with 0 per-file URL overrides. Every uploaded key returns HTTP 200 at the public base URL, including each new grid/frequency shape (site, gnz, gr, fx, monC).Follow-up
Several diagnostics will need to be updated to consume these newer datasets (notably those still pointing at the superseded WECANN version, NOAA-NCEI/WOA2023, or the old RAPID source). That work is tracked separately.
Summary by CodeRabbit
New Features
Bug Fixes