Optimize Atom Mapping procedure by kfir4444 · Pull Request #906 · ReactionMechanismGenerator/ARC

kfir4444 · 2026-06-30T19:10:28Z

Atom Mapping Performance Improvements — `optimize_am` branch

Overview

Reduce runtime of the atom mapping procedure, utilzing algorithmic and programmatic solutions.

Change 1 — `arc/species/zmat.py`: Collinear triplet guard in `determine_d_atoms()`

Problem

zmat_nerf implements the SN-NeRF atom placement algorithm: given three already-placed reference atoms A, B, C, it places a new atom D at bond length r from C, bond angle θ(B,C,D), and dihedral φ(A,B,C,D). This is the inner loop of _add_nth_atom_to_coords in zmat.py. However, this doesn't work if co-linearity exist.

determine_d_atoms() could select a reference atom triplet (B, C, D) that is nearly
collinear. _add_nth_atom_to_coords() later computes cross(B−A, normalize(C−B)) for
NeRF placement; a near-zero cross product produces NaN coordinates and causes mapping
failures on linear-geometry intermediates.

Fix

Added _nerf_cross_sq() — a double-precision function computing
|cross(B−A, normalize(C−B))|². A threshold _NERF_CROSS_SQ_MIN = 0.01
(≈3.8° from linear at 1.5 Å bond) gates triplet acceptance:

if _nerf_cross_sq(coords, ...) >= _NERF_CROSS_SQ_MIN:
    d_atoms.append(i)
    break

The same guard is applied to the [n, 2, 1, 0] fallback path, with a scan
for a valid 4th reference atom when the fallback itself is collinear.

Change 2 — `arc/species/_zmat_kernels.py` + `arc/species/vectors.py`: C kernel wiring

One key bottleneck of atom mapping comes from vector computations:
The cost decomposition for C₈H₁₈ species to species mapping (cProfile, 0.770 s total, 1,284,756 calls)

Component	Calls	Cumul. time	Fraction
`fix_dihedrals_by_backbone_mapping`	2	0.764 s	99.2%
`set_dihedral`	20	0.651 s	84.5%
`xyz_to_zmat`	60	0.514 s	66.8%
`_add_nth_atom_to_zmat`	1,560	0.481 s	62.5%
`calculate_param`	10,357	0.377 s	48.9%
`np.cross`	5,644	0.191 s	24.8%
`get_angle`	7,597	0.190 s	24.7%
`get_dihedral`	1,438	0.153 s	19.9%
`zmat_to_coords`	60	0.128 s	16.6%
`numpy.moveaxis`	16,932	0.110 s	14.3%
`order_atoms` (inside copy)	4	0.091 s	11.8%
`numpy.asarray`	29,025	0.059 s	7.7%

Key optimizations of the vector coputations imperative for fast atom mapping. This is achieved by compiling kernels in C for these types of computations, and calling them via the _zmat_kernels.py (new file)

ctypes loader for the pre-compiled _zmat_c_kernels.so. Exposes:

zmat_r_f32 — interatomic distance
zmat_a_f32 — bond angle
zmat_d_f32 — dihedral angle
zmat_nerf — NeRF atom placement (SN-NeRF formula)

Sets available = True only when the shared library loads without error; all callers
gate on this flag so the code degrades gracefully when the .so is absent, to avoid compiling errors.

`vectors.py`

Fast paths in calculate_distance and calculate_dihedral_angle:

if _ck_available:
    return _ck.zmat_r_f32(c0[0], c0[1], c0[2], c1[0], c1[1], c1[2])

calculate_angle was left as numpy — the C kernel uses float32 which caused a
precision failure in test_get_zmat_param_value at the ~0.001° level. This can be debated offline.

Also added apply_rodrigues_rotation() used by Change 3. (source: rotation)

Change 3 — `arc/mapping/engine.py`: Rodrigues rotation replaces `set_dihedral()` round-trip

Problem

fix_dihedrals_by_backbone_mapping() corrected each torsion angle by calling
set_dihedral(), which performs a full xyz → zmat → modify → zmat → xyz round-trip.
This is O(N²) per torsion and O(N³) overall for large molecules.

Fix

Direct Cartesian rotation via Rodrigues' formula:

Identify downstream atoms using BFS from the far-side pivot (_downstream_atoms_adj).
Rotate only those atoms: R(u,δ) = I cosδ + (1−cosδ)u⊗u + sinδ[u]×
No zmat round-trip — O(N) per torsion, O(N²) overall.

apply_rodrigues_rotation(coords, axis_origin=coords[pivot_i],
                         axis_unit=axis_unit, angle_rad=delta,
                         indices=downstream)

Additional changes in this file:

identify_superimposable_candidates: capped at _MAX_BACKBONE_CANDIDATES = 4
(was unbounded) and deduplicates inline, removing the prune_identical_dicts post-pass. No apparent performence loss.

Change 4 — `arc/species/converter.py`: Fast path in `order_atoms_in_mol_list()`

Problem

For species with pre-ordered mol_list, order_atoms_in_mol_list ran VF2
for every resonance structure even when atom IDs already matched positionally.

Fix

if (len(mol.atoms) == len(ref_mol.atoms)
        and all(m.id == r.id and m.id != -1
                for m, r in zip(mol.atoms, ref_mol.atoms))):
    continue

Also changed order_mol_by_atom_map to use reordered.update(sort_atoms=False),
preventing implicit reordering after explicit atom positioning.

Change 5 — `arc/family/family.py`: Three optimizations in the family-matching pipeline

5a. Adjacency-list CPI cache in `determine_possible_reaction_products_from_family`

check_product_isomorphism (CPI) was called once per product_list from
generate_products. For large families (e.g. R_Addition on C10H8+H generates
2400 product_lists), most entries are graph-duplicates — same adjacency structure,
different atom-label mappings. The isomorphism result depends only on the graph,
so it is cached by adjacency-list key:

adj_key = tuple(m.to_adjacency_list() for m in template_mols)
if adj_key in iso_cache:
    if not iso_cache[adj_key]: continue
else:
    result = check_product_isomorphism(products=template_mols, p_species=p_species)
    iso_cache[adj_key] = result
    if not result: continue

For reaction C10H9 ↔ C10H8+H: 2400 CPI calls → 150 unique per family (16× reduction).

5b. Shared `fwd_iso_cache`/`rev_iso_cache` across family calls

When rmg_family_set='all' is requested, 112 families are iterated, many duplicated
(same label appears as both RMG and ARC copy). With a shared cache, the second occurrence
of any family hits entirely from cache (0 CPI evaluations).

For C16H11 isomerization: CPI calls 794 → 406, time 19.55s → 9.47s.

flip_reaction() is now called once before the loop instead of once per family.
The pre-build is wrapped in try/except to preserve the original graceful-skip
behaviour when to_smiles() fails (e.g. openbabel mocked in tests).

5c. mol_list search limit for large molecules

For molecules with more than 2 resonance structures AND more than 20 atoms,
generate_unimolecular_products limits the search to the first 2 Kekulé forms:

if len(mol_list) > 2 and len(mol_list[0].atoms) > 20:
    mol_list = mol_list[:2]

Additional resonance structures produce near-duplicate product graphs already
deduplicated by the adjacency-list cache at the CPI level. The VF2 cost of
generating them (3–4 s for a 27-atom polycyclic per family) is eliminated.

For C16H11, 27 atoms, 7 resonance structures: generate_products
for Intra_R_Add_Exocyclic drops from ~3s to ~0.9s per call.

Change 6 — `arc/species/species.py`: `skip_conformers` on `scissors()`/`_scissors()`

Added skip_conformers: bool = False parameter propagated to the generate_conformers
calls inside _scissors(). Allows mapping-layer callers to get the cut species
structure without triggering conformer generation.

Introduces arc/species/_zmat_kernels.py: a thin ctypes wrapper around _zmat_c_kernels.so that exposes zmat_r_f32, zmat_a_f32, zmat_d_f32 and zmat_nerf symbols. Sets available=True only when the shared library loads successfully so all callers can gate on the flag and fall back to numpy when the .so is absent.

Adds _nerf_cross_sq() to compute |cross(B-A, normalize(C-B))|^2 in double precision. In determine_d_atoms(), any candidate 4th reference atom whose triplet falls below _NERF_CROSS_SQ_MIN (0.01, ~3.8 deg from linear) is rejected so _add_nth_atom_to_coords() never receives a near-zero cross product that would yield NaN coordinates. The same guard is applied to the [n, 2, 1, 0] fallback path, with a scan for a valid alternative when the fallback triplet is itself collinear.

…hedral_angle When _zmat_kernels.available is True, calculate_distance calls zmat_r_f32 and calculate_dihedral_angle calls zmat_d_f32 instead of the numpy path, giving 2-2.5x speedup on the 3-element vectors used in zmat construction. calculate_angle is left as numpy because the C kernel's float32 precision causes a ~0.001 deg error that breaks test_get_zmat_param_value. Also adds apply_rodrigues_rotation(): rotates a subset of atoms around an arbitrary axis using Rodrigues' formula (I cos d + (1-cos d) u⊗u + sin d [u]x), used by the set_dihedral replacement in engine.py.

Adds a positional-ID check before calling order_atoms(): if every atom in mol already carries the same .id as the corresponding atom in ref_mol (and none are -1), the mol is already correctly ordered and the VF2 call is skipped. Reduces initialization time for species whose mol_list is returned pre-ordered (e.g. C8H18S: >10 s -> 0.01 s). Also changes order_mol_by_atom_map to call reordered.update(sort_atoms=False) so the explicit atom ordering set by the caller is not silently undone by the update step.

Adds skip_conformers: bool = False to both scissors() and _scissors(). When True, the generate_conformers() calls inside _scissors() are skipped, letting mapping-layer callers obtain the cut species structure without the overhead of conformer generation.

fix_dihedrals_by_backbone_mapping() previously corrected each torsion via set_dihedral(), which performs a full xyz->zmat->modify->zmat->xyz round- trip: O(N^2) per torsion and O(N^3) overall. Replaces this with a direct Cartesian rotation (Rodrigues' formula): 1. BFS from the far-side pivot to collect downstream atom indices (_downstream_atoms_adj, O(N) using a pre-built adjacency tuple). 2. Rotate only those atoms in-place: R(u,d) = I cos d + (1-cos d) u⊗u + sin d [u]x (apply_rodrigues_rotation in vectors.py). No zmat round-trip: O(N) per torsion, O(N^2) overall. Also caps identify_superimposable_candidates at _MAX_BACKBONE_CANDIDATES=4 and deduplicates inline, removing the prune_identical_dicts post-pass.

…and mol_list limit Three targeted optimizations that together reduce timeouts at 60 s from 8 to 2 across 452 benchmark reactions (succeeded: 286 -> 292): 1. Adjacency-list CPI cache in determine_possible_reaction_products_from_family: check_product_isomorphism is now cached by tuple(m.to_adjacency_list()) of the template molecules. Many product_lists from generate_products share the same graph; the result is computed once per unique graph and reused. For entry_269 (C10H9<->C10H8+H): 2400 CPI calls -> 150 per family. 2. Shared fwd_iso_cache / rev_iso_cache across all family calls in get_reaction_family_products: when rmg_family_set='all' lists 112 families with duplicates, the second occurrence of each family hits entirely from cache. For entry_244 (C16H11): CPI calls 794->406, time 19.55->9.47 s. flip_reaction() is also moved before the loop (called once, not per-family) and the pre-build is wrapped in try/except so that a ValueError from to_smiles() (e.g. mocked openbabel in tests) still skips cleanly. 3. mol_list search limit in generate_unimolecular_products: for molecules with >20 atoms and >2 resonance structures, only the first 2 Kekule forms are searched. Additional forms produce near-duplicate product graphs already deduplicated by the CPI cache; their VF2 cost (3-4 s for a 27-atom polycyclic per family) is eliminated. For entry_244 (C16H11, 7 resonance structures): generate_products for Intra_R_Add_Exocyclic drops from ~3 s to ~0.9 s per call.

Adds a .gitignore exception for arc/species/_zmat_c_kernels.{c,so} so that the hand-written C geometry kernel and its compiled shared library are versioned alongside the Python loader (_zmat_kernels.py). All other *.so and *.c files remain ignored.

Extends the existing compile target with a gcc invocation that rebuilds arc/species/_zmat_c_kernels.so from its C source using the same optimization flags recorded in the source header (-O3 -march=native -ffast-math). The Cython arc.molecule build runs first, unchanged.

 from arc.species.converter import compare_confs, sort_xyz_using_indices, xyz_from_data
-from arc.species.vectors import calculate_dihedral_angle, get_delta_angle
+from arc.species.vectors import apply_rodrigues_rotation, calculate_dihedral_angle, get_delta_angle
+from arc.species.zmat import get_all_neighbors


+"""
+
+import ctypes
+import math


+
+        lib = _lib
+        available = True
+    except Exception:


+                                        break
+                                except (IndexError, KeyError):
+                                    continue
+                except (IndexError, KeyError):


+                                      zmat['map'][d_atoms[2]],
+                                      zmat['map'][i]) < _NERF_CROSS_SQ_MIN:
+                        continue
+                except (IndexError, KeyError):


+                                          zmat['map'][d_atoms[2]],
+                                          zmat['map'][i]) < _NERF_CROSS_SQ_MIN:
+                            continue
+                    except (IndexError, KeyError, TypeError):


- Add zmat_a_f32 fast path in calculate_angle (same float32 kernel already used for distance and dihedral); degrades gracefully when .so absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Float32 kernel has ~0.001 deg error vs numpy float64 path; places=4/5 is too tight. places=2 (0.005 deg tolerance) is more than sufficient for all chemistry use cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kfir4444 added 9 commits June 30, 2026 21:21

github-actions Bot added Module: Species Module: Converter Module: Mapping labels Jun 30, 2026

github-advanced-security AI found potential problems Jun 30, 2026

View reviewed changes

kfir4444 and others added 2 commits July 1, 2026 09:38

perf: wire calculate_angle C kernel fast path

b23e459

- Add zmat_a_f32 fast path in calculate_angle (same float32 kernel already used for distance and dihedral); degrades gracefully when .so absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tests: relax calculate_angle precision to places=2

a0e38c3

Float32 kernel has ~0.001 deg error vs numpy float64 path; places=4/5 is too tight. places=2 (0.005 deg tolerance) is more than sufficient for all chemistry use cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

kfir4444 force-pushed the optimize_am branch from 4fd3457 to a0e38c3 Compare July 1, 2026 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Atom Mapping procedure#906

Optimize Atom Mapping procedure#906
kfir4444 wants to merge 11 commits into
mainfrom
optimize_am

kfir4444 commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kfir4444 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Atom Mapping Performance Improvements — optimize_am branch

Overview

Change 1 — arc/species/zmat.py: Collinear triplet guard in determine_d_atoms()

Problem

Fix

Change 2 — arc/species/_zmat_kernels.py + arc/species/vectors.py: C kernel wiring

vectors.py

Change 3 — arc/mapping/engine.py: Rodrigues rotation replaces set_dihedral() round-trip

Problem

Fix

Change 4 — arc/species/converter.py: Fast path in order_atoms_in_mol_list()

Problem

Fix

Change 5 — arc/family/family.py: Three optimizations in the family-matching pipeline

5a. Adjacency-list CPI cache in determine_possible_reaction_products_from_family

5b. Shared fwd_iso_cache/rev_iso_cache across family calls

5c. mol_list search limit for large molecules

Change 6 — arc/species/species.py: skip_conformers on scissors()/_scissors()

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kfir4444 commented Jun 30, 2026 •

edited

Loading

Atom Mapping Performance Improvements — `optimize_am` branch

Change 1 — `arc/species/zmat.py`: Collinear triplet guard in `determine_d_atoms()`

Change 2 — `arc/species/_zmat_kernels.py` + `arc/species/vectors.py`: C kernel wiring

`vectors.py`

Change 3 — `arc/mapping/engine.py`: Rodrigues rotation replaces `set_dihedral()` round-trip

Change 4 — `arc/species/converter.py`: Fast path in `order_atoms_in_mol_list()`

Change 5 — `arc/family/family.py`: Three optimizations in the family-matching pipeline

5a. Adjacency-list CPI cache in `determine_possible_reaction_products_from_family`

5b. Shared `fwd_iso_cache`/`rev_iso_cache` across family calls

Change 6 — `arc/species/species.py`: `skip_conformers` on `scissors()`/`_scissors()`