Skip to content

Add executor-backed and MPI batch evaluators#685

Open
hmgaudecker wants to merge 8 commits into
feat/user-batch-evaluatorfrom
feat/executor-batch-evaluator
Open

Add executor-backed and MPI batch evaluators#685
hmgaudecker wants to merge 8 commits into
feat/user-batch-evaluatorfrom
feat/executor-batch-evaluator

Conversation

@hmgaudecker

Copy link
Copy Markdown
Member

Stacked on #684 (the batch_evaluator kwarg). Review/merge #684 first; this PR's diff is against that branch.

What

Adds two batch evaluators so a single-driver optimizer can fan its batched criterion
evaluations across a concurrent.futures.Executor:

  • executor_batch_evaluator(executor) — a generic factory returning a
    BatchEvaluator-protocol callable backed by any executor (ProcessPoolExecutor,
    ThreadPoolExecutor, Dask, mpi4py.futures.MPIPoolExecutor, …). Order-preserving via
    executor.map; reuses the existing unpack/catch machinery so error_handling
    ("raise" reraises, "continue" → traceback string) matches the other evaluators.
  • mpi_batch_evaluator — a named peer to joblib/pathos/threading, registered
    so batch_evaluator="mpi" works (added to BatchEvaluatorLiteral and
    process_batch_evaluator). Lazily imports mpi4py (clear error → optional
    optimagic[mpi] extra), configures cloudpickle as the MPI serializer, and caches a
    module-level MPIPoolExecutor (one pool per process; a per-batch pool would be
    catastrophic).

Plus an optimagic[mpi] optional extra and a how-to (how_to_distributed_optimization)
that explains the single-driver model and the python -m mpi4py.futures launch
precondition.

Why

The batched evaluations an algorithm like tranquilo requests were locked to in-process
backends. On a cluster you want them spread across nodes. The right MPI pattern is a
single optimizer on the driver rank with the worker ranks parked by the
mpi4py.futures launcher — not running the optimizer on every rank (that diverges
under floating-point nondeterminism). An executor's single-submitter model makes that
the only expressible pattern, so the trap is structurally avoided.

Pickling (the crux)

stdlib ProcessPoolExecutor serializes tasks with plain pickle regardless of start
method, so closures/locally-defined criteria (optimagic's partial_func_of_params
output) would be rejected under spawn/forkserver and under MPI. A module-level
_CloudpickleTask wrapper carries a cloudpickle payload that plain pickle transports
intact — giving joblib/loky-level parity for any executor. cloudpickle is already a hard
dependency, so this adds nothing new.

Tests

37 pass. Notably a closure evaluated through ProcessPoolExecutor(mp_context=spawn)
(the case that fails without the wrapper), the ThreadPool path, raise/continue parity,
process_batch_evaluator("mpi") resolution, and the no-mpi4py clear-error path. All
local/deterministic — no cluster needed.

Points for reviewer attention

  1. executor_batch_evaluator has no string alias (unlike "mpi"), so it's only
    reachable via from optimagic.batch_evaluators import executor_batch_evaluator.
    Should it be re-exported at the top level (optimagic.executor_batch_evaluator) for
    discoverability? Left as-is for now.
  2. Cached MPI executor lifecycle: mpi_batch_evaluator caches one module-level
    MPIPoolExecutor, never explicitly shut down (dies with the process). No API to reset
    between independent optimizations in one process — fine for the
    one-optimization-per-process HPC model, but worth a conscious call.
  3. "No workers" detection uses executor.num_workers == 0. Best-effort: launched
    without -m mpi4py.futures, mpi4py may fall back to dynamic MPI_Comm_spawn rather
    than reporting 0. The how-to and the error message both state the launcher precondition
    explicitly. Could use a maintainer's eye with cluster access — I couldn't run real MPI.

Notes

  • CHANGES.md references {gh}NNN`` — I'll replace with this PR's number once assigned.
  • pixi.lock intentionally not regenerated: the mpi extra isn't pulled into any
    pixi environment (so pixi install --frozen passes against the current lock), and a
    pixi lock here wanted an unrelated repo-wide v6→v7 format upgrade. Left for the
    maintainer.

🤖 Generated with Claude Code

hmgaudecker and others added 8 commits June 25, 2026 11:05
Adds `executor_batch_evaluator(executor)`, which adapts any
`concurrent.futures.Executor` (ProcessPoolExecutor, ThreadPoolExecutor,
mpi4py.futures.MPIPoolExecutor, …) to optimagic's BatchEvaluator protocol.
A module-level `_CloudpickleTask` wrapper serializes the criterion with
cloudpickle so closures and locally defined functions survive any executor's
plain-pickle transport (e.g. spawn-based process pools).

Adds the named `mpi_batch_evaluator`, symmetric to joblib/pathos/threading:
it lazily imports mpi4py (optional `optimagic[mpi]` extra), configures
cloudpickle as MPI's serializer, caches a single module-level MPIPoolExecutor
across batches, and raises a clear error when no worker ranks are available
(program not launched via `python -m mpi4py.futures`). It delegates the actual
mapping to executor_batch_evaluator. Registers "mpi" in BatchEvaluatorLiteral
and process_batch_evaluator.

Adds a how-to guide on distributed optimization with MPI and a CHANGES entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a pixi `mpi` feature (mpi4py + mpich, Linux-only) and a
`tests-mpi-py314` environment, an `mpi` pytest marker, and an
integration test that launches `_mpi_helper.py` under
`mpiexec -n 3 python -m mpi4py.futures`. The helper fans a locally
defined closure out through `mpi_batch_evaluator` and asserts the
results come back in input order, exercising cloudpickle-over-MPI
transport end to end. The test skips only when mpiexec / mpi4py are
absent.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a `run-tests-mpi` CI job (ubuntu-latest, `tests-mpi-py314`) that
first asserts mpiexec + mpi4py are present so the MPI tests cannot
silently skip, then runs them. Modernizes the shared CI tooling:
setup-pixi v0.9.4 → v0.9.6, pinned pixi-version v0.65.0 → v0.70.2 (the
pixi that wrote the v7 lock), actions/checkout v4 → v5, and
codecov-action v4 → v5 across all jobs. Regenerates pixi.lock to v7
with the new MPI environment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
It is the only entry point for a bring-your-own-executor batch evaluator
(the named evaluators are reachable by string), so it belongs in the public
namespace next to the BatchEvaluator protocol.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Driver/worker distributed optimization needs the worker ranks to evaluate the
driver's broadcast points through the exact same conversion and value
post-processing the driver uses — otherwise a worker handed the raw user
criterion receives the optimizer's internal parameter vector instead of the
external params and fails. `build_internal_fun` exposes that internal
`x -> (value, history_entry, log_entry)` callable so a worker can build an
interchangeable evaluator on its own resources from the same
params/bounds/constraints/algorithm.

The per-point batch evaluation is now pure (no logging side effect); the
process that owns the history and log database records every point — including
those evaluated on a remote worker — exactly once, stamping each with the
running optimization step. This also makes the joblib path log from the parent
rather than concurrently from each subprocess.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A point evaluated on an unstepped worker arrives with step=None; the recording
process must attribute it to the running optimization step. The new test fails
if the re-stamp is dropped, locking the behavior the MPI driver/worker split
relies on.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant