Skip to content

feat(obs4ref): add CEDA obs_for_ref_v2 reference datasets#778

Open
lewisjared wants to merge 2 commits into
mainfrom
feat/obs4ref-ceda-datasets
Open

feat(obs4ref): add CEDA obs_for_ref_v2 reference datasets#778
lewisjared wants to merge 2 commits into
mainfrom
feat/obs4ref-ceda-datasets

Conversation

@lewisjared

@lewisjared lewisjared commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Registers 33 new obs4REF reference-data files covering 12 CEDA obs_for_ref_v2 observational datasets, and drops 8 superseded registry entries. The corresponding NetCDF files have been uploaded to the obs4ref store (ref-obs4ref-public) and resolve at https://obs4ref.climate-ref.org/obs4REF/.... Registry entry count goes 58 -> 83.

Datasets added / upgraded

LORA-1-0 (mrro), WECANN-1-0 (gpp/hfls/hfss), ESACCI-CLOUD-AVHRR-AMPM-3-0 (12 radiation/cloud variables), GFED-5-0 (burntFractionAll), SAGE-CCI-OMPS (o3, zonal), FLUXNET2015-1-0 (gpp, site), HWSD-2-0 (cSoil), CALIPSO-ICECLOUD-1-0 (cli), WOA-23 under NOAA-NCEO-OCL (no3/o2/po4/si/so/thetao/ohc/ohcJm2), RAPID-2023-1a (msftmz), CCI-CryoClim-FSC-1 (snc), and Hoffman-1-0 (fgco2/nbp).

Superseded entries removed

WECANN-1-0 at the old 20250516 version (3 files) replaced by v20250902, and NOAA-NCEI/WOA2023 at 20250516 (5 files) replaced by the NOAA-NCEO-OCL/WOA-23 source. The old objects remain in the bucket as unreferenced orphans and can be pruned separately.

Provenance

Each new entry is preceded by a # source: comment carrying its originating CEDA dap.ceda.ac.uk download URL. These lines are documentary only (pooch skips comment lines); fetches still resolve against the obs4ref base URL, not CEDA. The comment is the only place the CEDA path is recoverable because the registry path uses the DRS grid label (gn/gm/gr/gnz/site) rather than the CEDA resolution string.

Verification

All 33 files md5-match the CEDA listing and the repo's own scripts/create-registry.py output. import climate_ref loads the registry as 83 entries with 0 per-file URL overrides. Every uploaded key returns HTTP 200 at the public base URL, including each new grid/frequency shape (site, gnz, gr, fx, monC).

Follow-up

Several diagnostics will need to be updated to consume these newer datasets (notably those still pointing at the superseded WECANN version, NOAA-NCEI/WOA2023, or the old RAPID source). That work is tracked separately.

Summary by CodeRabbit

  • New Features

    • Expanded the Obs4REF reference registry with many new observational dataset entries across cloud, radiation, burnt area, ozone, fluxes, soil carbon, ice cloud, ocean biogeochemistry, sea ice and ocean carbon fluxes.
  • Bug Fixes

    • Replaced older WECANN and WOA2023 references with newer versions.
    • Updated several existing dataset entries and their metadata to the latest available records.

Register 33 obs4REF reference-data files covering 12 CEDA obs_for_ref_v2
observational datasets, and drop 8 superseded entries
(WECANN-1-0 and NOAA-NCEI/WOA2023 at the old 20250516 version).

Each new entry records its originating CEDA download URL as a
provenance comment so the upstream source stays recoverable even though
the registry path uses the DRS grid label rather than the CEDA
resolution string.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 0727a54f-1f33-4fed-912a-a6c32bdd32d1

📥 Commits

Reviewing files that changed from the base of the PR and between 383bd3f and 7439171.

📒 Files selected for processing (2)
  • changelog/778.feature.md
  • packages/climate-ref/src/climate_ref/dataset_registry/obs4ref_reference.txt

📝 Walkthrough

Walkthrough

This PR updates the obs4REF dataset registry reference file, replacing entries for several providers (ARCCSS, ColumbiaU WECANN, DWD, NOAA WOA2023, and others) with newer versioned datasets, updated MD5 hashes, and added source comments. A corresponding changelog entry documents the addition of 33 new files across 12 dataset categories.

Changes

Obs4REF reference registry update

Layer / File(s) Summary
Updated dataset entries and provenance comments
packages/climate-ref/src/climate_ref/dataset_registry/obs4ref_reference.txt
Replaces ARCCSS LORA-1-1, ColumbiaU WECANN, and NOAA WOA2023 entries with newer versioned datasets (LORA-1-0, updated WECANN, NOAA-NCEO-OCL WOA-23), updates DWD ESACCI-Cloud-AVHRR-AMPM and CERES/CMAP/GPCP MD5s and paths, adds new FMI, Fluxnet2015, IIASA-FAO, DoESS-UCI GFED, NR CCI-CryoClim-FSC, and UCI-ORNL Hoffman entries, and adds `# source:` comment lines throughout.
Changelog entry for registry update
changelog/778.feature.md
Adds a changelog entry describing the addition of 33 new reference-data files across 12 observational dataset categories and the replacement of older WECANN and WOA2023 entries.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarises the main change: adding CEDA obs_for_ref_v2 reference datasets to obs4REF.
Description check ✅ Passed The description is detailed and covers the summary, datasets, provenance, verification, and follow-up.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/obs4ref-ceda-datasets

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant