Skip to content

ngmix: single source of truth for original vs reconvolved PSF columns#761

Open
cailmdaley wants to merge 14 commits into
ngmix_v2.0from
refactor/psf-column-grammar
Open

ngmix: single source of truth for original vs reconvolved PSF columns#761
cailmdaley wants to merge 14 commits into
ngmix_v2.0from
refactor/psf-column-grammar

Conversation

@cailmdaley

@cailmdaley cailmdaley commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Follows the #741 review discussion of the ngmix PSF columns. The output conflated the original and reconvolved PSF — both *_psf and *_psfo carried the reconvolved fit, so g1_psf (expected ~0 on the metacal image) silently held the original-PSF value, and Tpsf did not mean what its name says. This PR makes each PSF quantity a single source of truth whose name matches its meaning.

Where to focus review

The science lives entirely in ngmix.py — specifically the two average_*_psf weighting functions (how each PSF family is combined across epochs) and the do_ngmix_metacal wiring (the metacal call + the new original-PSF pre-fit and its RNG handling). make_cat.py and the .param files are mechanical serialization of the new column names; the rest is test coverage.

Changes

  • Restore an explicit fit to the original PSF and store both the original and reconvolved fits, split by name.
  • Rename the ngmix columns to a consistent ESTIMATOR_COMPONENT_OBJECT grammar; update the shipped .param files.
  • Drop the dead r50 columns (derivable from T via cs_util.size).
  • Route the bare size conversions through cs_util.size.
  • Property + unit tests for the new grammar and both PSF families.

Diff at a glance (~2.0k insertions)

  • tests ~1.6k — the bulk: grammar/property/unit coverage of both PSF families + failure paths
  • python library ~430
  • config (.param) ~110
  • uv.lock / test-infra ~20

Notes for review

— Claude on behalf of Cail

cailmdaley and others added 12 commits June 20, 2026 14:53
…749)

Metacalibration has two PSFs and ngmix was exporting only one of them
under both column families, mislabeled. Fix the substance and the names
together.

Substance — the original image PSF is fit again. average_original_psf
fits each epoch's pristine gal_obs.psf (the psfex/mccd model stamp) with
the existing psf_runner, BEFORE boot.go reconvolves it, weight-averaged by
the galaxy obs.weight.sum() — the same scheme average_multiepoch_psf uses
for the reconvolution kernel. A shared _average_psf_fits core backs both
averagers; do_ngmix_metacal now returns (resdict, psf_res, psfo_res), with
psfo_res computed before boot.go so it sees the un-reconvolved PSF.

Names — the two families are now self-naming and never alias:
  *_psf_orig    original image PSF  (average_original_psf)
  *_psf_reconv  reconvolution kernel (average_multiepoch_psf)
Each carries g1/g2 (+_err), T (+_err) and r50 (+_err). The back-fill and
compile_results emit loop copy these keys straight through, so the column
name is the value's provenance. Galaxy ellipticity is the scalar g1/g2
pair. The previous code emitted one reconvolution-kernel result under both
g*_psfo_ngmix and g1_psf/g2_psf; that double-emit is gone.

Docstrings (compile_results, the two averagers, do_ngmix_metacal) now name
both PSFs explicitly; the old "psfo is the original image psf" note —
false since the original fit was dropped — is replaced by an accurate one.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mar (#749)

Adopt NGMIX[m]_<COMPONENT>[_ERR]_<OBJECT>_<SHEAR>, with OBJECT one of GAL,
PSF_ORIG (original psfex/mccd image PSF) or PSF_RECONV (metacal
reconvolution kernel), reading the self-named keys ngmix now writes:

  galaxy ELL (2-vec)   -> scalar NGMIX_G1/G2_GAL,  NGMIX_G1/G2_ERR_GAL
  galaxy NGMIX_T_      -> NGMIX_T_GAL  (+ _ERR), SNR/FLUX/MAG/FLAGS_GAL
  NGMIX_ELL_PSFo (mis- -> genuine NGMIX_G1/G2_PSF_ORIG (from the restored
    labeled reconv)        original fit) + NGMIX_G1/G2_PSF_RECONV
  NGMIX_T_PSFo, NGMIX_  -> NGMIX_T_PSF_ORIG and NGMIX_T_PSF_RECONV, each its
    Tpsf                   own family's size (+ _ERR on both)

The reconvolution-kernel ellipticity gains its _ERR columns too, for
symmetry with PSF_ORIG. The d6531b1 dedup (NGMIX_T_PSFo <- "Tpsf") is
reverted: PSF_ORIG size now comes from the original fit ("T_psf_orig"), not
the reconvolution-kernel "Tpsf" — they are different PSFs again.
N_EPOCH / MCAL_FLAGS / MCAL_TYPES_FAIL keep their names (no PSF meaning).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_fake_metacal_result now emits both self-named PSF key families with
distinct values (original PSF elliptical and smaller; reconvolution kernel
rounder and larger), so any regression to the old single-source aliasing
fails loudly. New tests:

  test_compile_results_psf_families_are_unaliased — g1/g2_psf_orig differ
    from g1/g2_psf_reconv, each tracing its own source
  test_average_original_psf_fits_each_gal_psf_with_galaxy_weight — the
    runner fits every epoch's gal_obs.psf, weight-averaged by the galaxy
    inverse-variance weight, failed-PSF epochs dropped

The size-columns test now checks both families' r50/T (orig != reconv).
Every do_ngmix_metacal call site unpacks the (res, psf_res, psfo_res)
3-tuple: test_ngmix, test_ngmix_weight_validation, test_mbias,
test_star_response.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…749)

The two in-repo param files are the renamed columns' shipped consumer
(read by scripts/python/create_final_cat.py). Map every NGMIX token to the
NGMIX_<COMPONENT>[_ERR]_<OBJECT>[_<SHEAR>] grammar:

  NGMIX_ELL_PSFo_NOSHEAR -> NGMIX_G1/G2_PSF_ORIG_NOSHEAR
  NGMIX_ELL_<SHEAR>      -> NGMIX_G1/G2_GAL_<SHEAR>
  NGMIX_ELL_ERR_NOSHEAR  -> NGMIX_G1/G2_ERR_GAL_NOSHEAR
  NGMIX_T[_ERR]_<SHEAR>  -> NGMIX_T[_ERR]_GAL_<SHEAR>
  NGMIX_Tpsf_<SHEAR>     -> NGMIX_T_PSF_RECONV_<SHEAR>
  NGMIX_T_PSFo_NOSHEAR   -> NGMIX_T_PSF_ORIG_NOSHEAR
  NGMIX_FLUX[_ERR]_<SHEAR> -> NGMIX_FLUX[_ERR]_GAL_<SHEAR>
  NGMIX_FLAGS_<SHEAR>    -> NGMIX_FLAGS_GAL_<SHEAR>

Non-NGMIX columns (NGMIX_MCAL_FLAGS, NGMIX_N_EPOCH, NGMIX_MOM_FAIL, and all
SExtractor/external columns) are untouched. Every uncommented NGMIX column
now matches a column make_cat._save_ngmix_data produces.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
)

No make_cat test existed for the renamed ngmix columns. Add four tests
driving SaveCatalogue._save_ngmix_data against a synthetic ngmix catalogue
(distinct orig vs reconv sentinels):

  - produced columns follow the new grammar; no old name (_PSFo, _Tpsf,
    NGMIX_ELL_) survives
  - NGMIX_G1_PSF_ORIG_* traces the orig source and NGMIX_G1_PSF_RECONV_* the
    reconv source, and the two differ (the un-aliasing #749 restores)
  - NGMIX_T_PSF_ORIG_* != NGMIX_T_PSF_RECONV_*
  - an obj_id absent from the ngmix cat keeps its sentinel fill
  - end-to-end: make_cat reads a catalogue ngmix's own compile_results +
    save_results serialised, guarding the reader against key-set drift

Soften the _save_ngmix_data grammar docstring so OBJECT and SHEAR read as
optional (NGMIX[m]_<COMPONENT>[_ERR][_<OBJECT>][_<SHEAR>]) — the metadata
columns NGMIX_MCAL_FLAGS / NGMIX_N_EPOCH / NGMIX_MCAL_TYPES_FAIL carry
neither.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#749)

SCIENCE-CRITICAL. average_original_psf called psf_runner.go(gal_obs), and
PSFRunner.go sets .gmix (and .meta['result']) on the observation it fits —
here gal_obs.psf, the PSF metacal later deep-copies and consumes via
boot.go(gal_obs_list). A stray gmix survives that copy and is reused as the
MetacalFitGaussPSF fallback when its own admom+ML PSF fits both fail, so
objects the base branch dropped (BootPSFFailure) silently survived on this
branch, changing the galaxy/shear result set. A rename+add-column refactor
must be bit-identical on galaxy results, so fit gal_obs.psf.copy() and read
from the copy, leaving gal_obs pristine (verified: gal_obs.psf carries no
gmix and no leaked meta['result'] afterward, matching base-branch state).

Also in do_ngmix_metacal:
  - return a MetacalResult NamedTuple (resdict, reconv, orig) so the two PSF
    families are named not positional, guarding against a reconv/orig
    transposition; still a tuple, so positional unpacking is unchanged.
  - rename psfo_res -> psf_orig_res everywhere (no trailing-o shorthand).
  - soften the average_original_psf weighting note: same form as
    average_multiepoch_psf (weight.sum(), skip flags!=0) but the reconv path
    weights by the fixnoise-combined inverse variance of the noshear metacal
    image while the orig path uses the raw galaxy inverse variance.
  - note the r50_psf_orig/reconv columns are intermediate-only (make_cat
    reads T_psf_*/g, not r50, so they do not reach the final catalogue).

Tests: a do_ngmix_metacal guard asserting gal_obs.psf carries no gmix after
the original-PSF pre-fit; and, on an elliptical PSF stamp, T_psf_reconv >
T_psf_orig and |g_psf_reconv| < |g_psf_orig| (true by construction — the
reconvolution kernel is round and dilated), which catches a psf_res /
psf_orig_res transposition.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
r50 = SIGMA_TO_R50*sqrt(T/2) is a deterministic function of T, which is
already exported; the r50_* keys were written to the intermediate ngmix
catalogue and read by nobody (confirmed across shapepipe + sp_validation).
Remove the computation, schema entries, and the local SIGMA_TO_R50 constant;
downstream r50 is derivable via cs_util.size.T_to_r50.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Four property-test modules, each mutation-verified non-vacuous:
- grammar: orig/reconv never crossed for any distinct inputs; every emitted
  column matches the NGMIX_<COMPONENT>[_ERR]_<OBJECT>[_<SHEAR>] grammar; every
  NGMIX token in final_cat.param is producible by the writer.
- physics: T_reconv > T_orig and |g_reconv| <= |g_orig| over random elliptical
  PSFs (the dilation clause is the robust transposition catcher).
- no-op: average_original_psf leaves gal_obs.psf pristine (no gmix leak), so
  the restored original-PSF fit cannot alter the galaxy/shear results.
- averaging: epoch-weighted mean within per-epoch range, flagged epochs
  excluded, single survivor passes through, sentinel fills for absent objects.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pyproject pins cs_util>=0.1.9 and the lock already resolved it from
PyPI, but to 0.2 — the last release *before* the size module. So the
dev image installs cs_util, yet `import cs_util.size` raises
ModuleNotFoundError: every path that would use the size-conversion
helpers is broken in-container.

cs_util 0.2.1 (the current PyPI release, "first release with
cs_util.size") ships cs_util/size.py with the Gaussian size web
(T_to_sigma / sigma_to_fwhm / sigma_to_r50 / ...). `uv lock
--upgrade-package cs-util` bumps only that pin; its dependency set is
unchanged, so no new heavy deps land. After the CI image rebuild
(`uv sync --frozen` in the Dockerfile) `import cs_util.size` resolves.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Gaussian size conversions are now sourced from cs_util.size, the
single source of truth shared with sp_validation, instead of being
re-derived locally:

- ngmix.py get_noise window: np.sqrt(guess[4]/2) -> cs_size.T_to_sigma
  (guess[4] is the ngmix area parameter T = 2 sigma^2; bit-exact).
- make_cat.py PSF_FWHM: galaxy.sigma_to_fwhm -> cs_size.sigma_to_fwhm
  (bare conversion, no pixel_scale).
- utilities/galaxy.sigma_to_fwhm becomes a thin wrapper that keeps the
  pixel_scale rescaling and ShapePipe input validation but delegates
  the bare 2*sqrt(2 ln 2) factor to cs_util.size; this preserves its
  second caller (spread_model.py, which uses pixel_scale) and the
  test_utilities contract. Matches the old hard-coded 2.35482004503
  constant to <1e-11 relative.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The PSF property tests (reconv rounder/larger than orig, pristine prefit)
need a non-round PSF to exercise. Add an optional psf_shear=(0,0) param to
the make_data test helper that shears the Moffat PSF; the default is a
no-op, so every existing caller is unchanged. Test-helper only, no
pipeline behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- do_ngmix_metacal: seed the original-PSF pre-fit from a snapshot of the
  metacal RNG so it no longer advances the rng boot.go consumes — galaxy/shear
  results are now bit-identical to the pre-PR base (only the *_PSF_ORIG columns
  are added). Conservative: galaxy science unchanged vs base.
- remove the redundant galaxy.sigma_to_fwhm wrapper; inline
  cs_size.sigma_to_fwhm(sigma) * pixel_scale at call sites (spread_model);
  cs_util unchanged.
- drop leftover psf_fit_prior knob references (docstrings/comments/tests).
- canonicalize the ESTIMATOR_COMPONENT_OBJECT grammar spec string across all sites.
- trim over-explained copy-PSF / leakage / weighting prose to one canonical spot;
  document that bit-identity rests on BOTH pristine gal_obs.psf AND the snapshot RNG.
- add tests: PSF columns survive a failed-fit NaN-fill; all-epochs-failed raises;
  grammar token check runs over both shipped param files. Fix a stale cross-ref.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cailmdaley and others added 2 commits June 20, 2026 17:08
The 12 hand-written g*/T_psf_{reconv,orig} assignments collapse to a single
template applied over (family, source) — same keys, same values, but the two
PSF families are now generated symmetrically, so one can't gain or misname a
column the other lacks. Addresses a drift/hardcoding risk Cail flagged in review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The two parallel 6-call blocks (PSF_ORIG / PSF_RECONV) collapse to one
template over (family, obj): the FITS column name and the res-key it reads are
now generated from the same pair, so the write seam (where the grammar drift
bit us) can't desync the two families. Behaviour-identical column set + values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cailmdaley

Copy link
Copy Markdown
Contributor Author

Hello @aguinot @martinkilbinger, this is my first attempt at a fix to all the PSF column issues that cropped up in #741. I decided to be very ambitious and define a common grammar for all column names (not just the PSF columns) to reduce ambiguity; this is quite far reaching since it changes everything that consumes the catalog, and will require a second PR in sp_validation to propagate these changes. However I think it will be worth it and is largely mechanical to propagate. If you think this is too aggressive please let me know and we can consider descoping to only touching the PSF columns. The only real science changes are the new ngmix fit to the original PSF (and the averaging over epochs) in ngmix.py.

Cheers, Cail

@aguinot

aguinot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Hello,
I would remove GAL from the ngmix quantities as the objects are not necessarily glaxies and the name let's suppose that a selection has been done and that you have a sample of galaxies only safe to use, while it is not the case. And it also makes the names longer.
You can also keep the spread model up-to-date but, it should not be used to apply a cut as it will create selection effects not taken into account by the response matrix.
I would also use a single size definition everywhere to avoid confusion. If you decide to use the trace T (which I strongly recommend!) only use that and remove columns such as PSF_FWHM (or converrt them to T), it can be derived from T.
Finally, I don't know why I used full capital names in the output catalogue but it might not be a good idea. Feel free to use lower case names.

@sachaguer

Copy link
Copy Markdown

Some notes on the column labelling for the star catalogue. I discussed with @fabianhervaspeters that the HSM shapes are e-types ellipticities, whereas ngmix returns g-type ellipticities. The new labelling shows that, for ngmix, we should ensure the naming is consistent for the HSM quantities and potentially add a function to easily convert between them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants