Skip to content

Optimize Atom Mapping procedure#906

Open
kfir4444 wants to merge 11 commits into
mainfrom
optimize_am
Open

Optimize Atom Mapping procedure#906
kfir4444 wants to merge 11 commits into
mainfrom
optimize_am

Conversation

@kfir4444

@kfir4444 kfir4444 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Atom Mapping Performance Improvements — optimize_am branch

Overview

Reduce runtime of the atom mapping procedure, utilzing algorithmic and programmatic solutions.


Change 1 — arc/species/zmat.py: Collinear triplet guard in determine_d_atoms()

Problem

zmat_nerf implements the SN-NeRF atom placement algorithm: given three already-placed reference atoms A, B, C, it places a new atom D at bond length r from C, bond angle θ(B,C,D), and dihedral φ(A,B,C,D). This is the inner loop of _add_nth_atom_to_coords in zmat.py. However, this doesn't work if co-linearity exist.

determine_d_atoms() could select a reference atom triplet (B, C, D) that is nearly
collinear. _add_nth_atom_to_coords() later computes cross(B−A, normalize(C−B)) for
NeRF placement; a near-zero cross product produces NaN coordinates and causes mapping
failures on linear-geometry intermediates.

Fix

Added _nerf_cross_sq() — a double-precision function computing
|cross(B−A, normalize(C−B))|². A threshold _NERF_CROSS_SQ_MIN = 0.01
(≈3.8° from linear at 1.5 Å bond) gates triplet acceptance:

if _nerf_cross_sq(coords, ...) >= _NERF_CROSS_SQ_MIN:
    d_atoms.append(i)
    break

The same guard is applied to the [n, 2, 1, 0] fallback path, with a scan
for a valid 4th reference atom when the fallback itself is collinear.


Change 2 — arc/species/_zmat_kernels.py + arc/species/vectors.py: C kernel wiring

One key bottleneck of atom mapping comes from vector computations:
The cost decomposition for C₈H₁₈ species to species mapping (cProfile, 0.770 s total, 1,284,756 calls)

Component Calls Cumul. time Fraction
fix_dihedrals_by_backbone_mapping 2 0.764 s 99.2%
set_dihedral 20 0.651 s 84.5%
xyz_to_zmat 60 0.514 s 66.8%
_add_nth_atom_to_zmat 1,560 0.481 s 62.5%
calculate_param 10,357 0.377 s 48.9%
np.cross 5,644 0.191 s 24.8%
get_angle 7,597 0.190 s 24.7%
get_dihedral 1,438 0.153 s 19.9%
zmat_to_coords 60 0.128 s 16.6%
numpy.moveaxis 16,932 0.110 s 14.3%
order_atoms (inside copy) 4 0.091 s 11.8%
numpy.asarray 29,025 0.059 s 7.7%

Key optimizations of the vector coputations imperative for fast atom mapping. This is achieved by compiling kernels in C for these types of computations, and calling them via the _zmat_kernels.py (new file)

ctypes loader for the pre-compiled _zmat_c_kernels.so. Exposes:

  • zmat_r_f32 — interatomic distance
  • zmat_a_f32 — bond angle
  • zmat_d_f32 — dihedral angle
  • zmat_nerf — NeRF atom placement (SN-NeRF formula)

Sets available = True only when the shared library loads without error; all callers
gate on this flag so the code degrades gracefully when the .so is absent, to avoid compiling errors.

vectors.py

Fast paths in calculate_distance and calculate_dihedral_angle:

if _ck_available:
    return _ck.zmat_r_f32(c0[0], c0[1], c0[2], c1[0], c1[1], c1[2])

calculate_angle was left as numpy — the C kernel uses float32 which caused a
precision failure in test_get_zmat_param_value at the ~0.001° level. This can be debated offline.

Also added apply_rodrigues_rotation() used by Change 3. (source: rotation)


Change 3 — arc/mapping/engine.py: Rodrigues rotation replaces set_dihedral() round-trip

Problem

fix_dihedrals_by_backbone_mapping() corrected each torsion angle by calling
set_dihedral(), which performs a full xyz → zmat → modify → zmat → xyz round-trip.
This is O(N²) per torsion and O(N³) overall for large molecules.

Fix

Direct Cartesian rotation via Rodrigues' formula:

  1. Identify downstream atoms using BFS from the far-side pivot (_downstream_atoms_adj).
  2. Rotate only those atoms: R(u,δ) = I cosδ + (1−cosδ)u⊗u + sinδ[u]×
  3. No zmat round-trip — O(N) per torsion, O(N²) overall.
apply_rodrigues_rotation(coords, axis_origin=coords[pivot_i],
                         axis_unit=axis_unit, angle_rad=delta,
                         indices=downstream)

Additional changes in this file:

  • identify_superimposable_candidates: capped at _MAX_BACKBONE_CANDIDATES = 4
    (was unbounded) and deduplicates inline, removing the prune_identical_dicts post-pass. No apparent performence loss.

Change 4 — arc/species/converter.py: Fast path in order_atoms_in_mol_list()

Problem

For species with pre-ordered mol_list, order_atoms_in_mol_list ran VF2
for every resonance structure even when atom IDs already matched positionally.

Fix

if (len(mol.atoms) == len(ref_mol.atoms)
        and all(m.id == r.id and m.id != -1
                for m, r in zip(mol.atoms, ref_mol.atoms))):
    continue

Also changed order_mol_by_atom_map to use reordered.update(sort_atoms=False),
preventing implicit reordering after explicit atom positioning.


Change 5 — arc/family/family.py: Three optimizations in the family-matching pipeline

5a. Adjacency-list CPI cache in determine_possible_reaction_products_from_family

check_product_isomorphism (CPI) was called once per product_list from
generate_products. For large families (e.g. R_Addition on C10H8+H generates
2400 product_lists), most entries are graph-duplicates — same adjacency structure,
different atom-label mappings. The isomorphism result depends only on the graph,
so it is cached by adjacency-list key:

adj_key = tuple(m.to_adjacency_list() for m in template_mols)
if adj_key in iso_cache:
    if not iso_cache[adj_key]: continue
else:
    result = check_product_isomorphism(products=template_mols, p_species=p_species)
    iso_cache[adj_key] = result
    if not result: continue

For reaction C10H9 ↔ C10H8+H: 2400 CPI calls → 150 unique per family (16× reduction).

5b. Shared fwd_iso_cache/rev_iso_cache across family calls

When rmg_family_set='all' is requested, 112 families are iterated, many duplicated
(same label appears as both RMG and ARC copy). With a shared cache, the second occurrence
of any family hits entirely from cache (0 CPI evaluations).

For C16H11 isomerization: CPI calls 794 → 406, time 19.55s → 9.47s.

flip_reaction() is now called once before the loop instead of once per family.
The pre-build is wrapped in try/except to preserve the original graceful-skip
behaviour when to_smiles() fails (e.g. openbabel mocked in tests).

5c. mol_list search limit for large molecules

For molecules with more than 2 resonance structures AND more than 20 atoms,
generate_unimolecular_products limits the search to the first 2 Kekulé forms:

if len(mol_list) > 2 and len(mol_list[0].atoms) > 20:
    mol_list = mol_list[:2]

Additional resonance structures produce near-duplicate product graphs already
deduplicated by the adjacency-list cache at the CPI level. The VF2 cost of
generating them (3–4 s for a 27-atom polycyclic per family) is eliminated.

For C16H11, 27 atoms, 7 resonance structures: generate_products
for Intra_R_Add_Exocyclic drops from ~3s to ~0.9s per call.


Change 6 — arc/species/species.py: skip_conformers on scissors()/_scissors()

Added skip_conformers: bool = False parameter propagated to the generate_conformers
calls inside _scissors(). Allows mapping-layer callers to get the cut species
structure without triggering conformer generation.


kfir4444 added 9 commits June 30, 2026 21:21
Introduces arc/species/_zmat_kernels.py: a thin ctypes wrapper around
_zmat_c_kernels.so that exposes zmat_r_f32, zmat_a_f32, zmat_d_f32 and
zmat_nerf symbols. Sets available=True only when the shared library loads
successfully so all callers can gate on the flag and fall back to numpy
when the .so is absent.
Adds _nerf_cross_sq() to compute |cross(B-A, normalize(C-B))|^2 in
double precision. In determine_d_atoms(), any candidate 4th reference
atom whose triplet falls below _NERF_CROSS_SQ_MIN (0.01, ~3.8 deg from
linear) is rejected so _add_nth_atom_to_coords() never receives a
near-zero cross product that would yield NaN coordinates. The same guard
is applied to the [n, 2, 1, 0] fallback path, with a scan for a valid
alternative when the fallback triplet is itself collinear.
…hedral_angle

When _zmat_kernels.available is True, calculate_distance calls zmat_r_f32
and calculate_dihedral_angle calls zmat_d_f32 instead of the numpy path,
giving 2-2.5x speedup on the 3-element vectors used in zmat construction.
calculate_angle is left as numpy because the C kernel's float32 precision
causes a ~0.001 deg error that breaks test_get_zmat_param_value.

Also adds apply_rodrigues_rotation(): rotates a subset of atoms around an
arbitrary axis using Rodrigues' formula (I cos d + (1-cos d) u⊗u + sin d [u]x),
used by the set_dihedral replacement in engine.py.
Adds a positional-ID check before calling order_atoms(): if every atom in
mol already carries the same .id as the corresponding atom in ref_mol (and
none are -1), the mol is already correctly ordered and the VF2 call is
skipped. Reduces initialization time for species whose mol_list is
returned pre-ordered (e.g. C8H18S: >10 s -> 0.01 s).

Also changes order_mol_by_atom_map to call reordered.update(sort_atoms=False)
so the explicit atom ordering set by the caller is not silently undone by
the update step.
Adds skip_conformers: bool = False to both scissors() and _scissors().
When True, the generate_conformers() calls inside _scissors() are skipped,
letting mapping-layer callers obtain the cut species structure without the
overhead of conformer generation.
fix_dihedrals_by_backbone_mapping() previously corrected each torsion via
set_dihedral(), which performs a full xyz->zmat->modify->zmat->xyz round-
trip: O(N^2) per torsion and O(N^3) overall.

Replaces this with a direct Cartesian rotation (Rodrigues' formula):
1. BFS from the far-side pivot to collect downstream atom indices
   (_downstream_atoms_adj, O(N) using a pre-built adjacency tuple).
2. Rotate only those atoms in-place: R(u,d) = I cos d + (1-cos d) u⊗u
   + sin d [u]x (apply_rodrigues_rotation in vectors.py).
No zmat round-trip: O(N) per torsion, O(N^2) overall.

Also caps identify_superimposable_candidates at _MAX_BACKBONE_CANDIDATES=4
and deduplicates inline, removing the prune_identical_dicts post-pass.
…and mol_list limit

Three targeted optimizations that together reduce timeouts at 60 s from 8
to 2 across 452 benchmark reactions (succeeded: 286 -> 292):

1. Adjacency-list CPI cache in determine_possible_reaction_products_from_family:
   check_product_isomorphism is now cached by tuple(m.to_adjacency_list())
   of the template molecules. Many product_lists from generate_products share
   the same graph; the result is computed once per unique graph and reused.
   For entry_269 (C10H9<->C10H8+H): 2400 CPI calls -> 150 per family.

2. Shared fwd_iso_cache / rev_iso_cache across all family calls in
   get_reaction_family_products: when rmg_family_set='all' lists 112 families
   with duplicates, the second occurrence of each family hits entirely from
   cache. For entry_244 (C16H11): CPI calls 794->406, time 19.55->9.47 s.
   flip_reaction() is also moved before the loop (called once, not per-family)
   and the pre-build is wrapped in try/except so that a ValueError from
   to_smiles() (e.g. mocked openbabel in tests) still skips cleanly.

3. mol_list search limit in generate_unimolecular_products: for molecules
   with >20 atoms and >2 resonance structures, only the first 2 Kekule forms
   are searched. Additional forms produce near-duplicate product graphs
   already deduplicated by the CPI cache; their VF2 cost (3-4 s for a
   27-atom polycyclic per family) is eliminated. For entry_244 (C16H11,
   7 resonance structures): generate_products for Intra_R_Add_Exocyclic
   drops from ~3 s to ~0.9 s per call.
Adds a .gitignore exception for arc/species/_zmat_c_kernels.{c,so} so
that the hand-written C geometry kernel and its compiled shared library
are versioned alongside the Python loader (_zmat_kernels.py). All other
*.so and *.c files remain ignored.
Extends the existing compile target with a gcc invocation that rebuilds
arc/species/_zmat_c_kernels.so from its C source using the same
optimization flags recorded in the source header (-O3 -march=native
-ffast-math). The Cython arc.molecule build runs first, unchanged.
Comment thread arc/mapping/engine.py
from arc.species.converter import compare_confs, sort_xyz_using_indices, xyz_from_data
from arc.species.vectors import calculate_dihedral_angle, get_delta_angle
from arc.species.vectors import apply_rodrigues_rotation, calculate_dihedral_angle, get_delta_angle
from arc.species.zmat import get_all_neighbors
"""

import ctypes
import math

lib = _lib
available = True
except Exception:
Comment thread arc/species/zmat.py
break
except (IndexError, KeyError):
continue
except (IndexError, KeyError):
Comment thread arc/species/zmat.py
zmat['map'][d_atoms[2]],
zmat['map'][i]) < _NERF_CROSS_SQ_MIN:
continue
except (IndexError, KeyError):
Comment thread arc/species/zmat.py
zmat['map'][d_atoms[2]],
zmat['map'][i]) < _NERF_CROSS_SQ_MIN:
continue
except (IndexError, KeyError, TypeError):
kfir4444 and others added 2 commits July 1, 2026 09:38
- Add zmat_a_f32 fast path in calculate_angle (same float32 kernel already
  used for distance and dihedral); degrades gracefully when .so absent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Float32 kernel has ~0.001 deg error vs numpy float64 path; places=4/5
is too tight. places=2 (0.005 deg tolerance) is more than sufficient
for all chemistry use cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants