Optimize Atom Mapping procedure#906
Open
kfir4444 wants to merge 11 commits into
Open
Conversation
Introduces arc/species/_zmat_kernels.py: a thin ctypes wrapper around _zmat_c_kernels.so that exposes zmat_r_f32, zmat_a_f32, zmat_d_f32 and zmat_nerf symbols. Sets available=True only when the shared library loads successfully so all callers can gate on the flag and fall back to numpy when the .so is absent.
Adds _nerf_cross_sq() to compute |cross(B-A, normalize(C-B))|^2 in double precision. In determine_d_atoms(), any candidate 4th reference atom whose triplet falls below _NERF_CROSS_SQ_MIN (0.01, ~3.8 deg from linear) is rejected so _add_nth_atom_to_coords() never receives a near-zero cross product that would yield NaN coordinates. The same guard is applied to the [n, 2, 1, 0] fallback path, with a scan for a valid alternative when the fallback triplet is itself collinear.
…hedral_angle When _zmat_kernels.available is True, calculate_distance calls zmat_r_f32 and calculate_dihedral_angle calls zmat_d_f32 instead of the numpy path, giving 2-2.5x speedup on the 3-element vectors used in zmat construction. calculate_angle is left as numpy because the C kernel's float32 precision causes a ~0.001 deg error that breaks test_get_zmat_param_value. Also adds apply_rodrigues_rotation(): rotates a subset of atoms around an arbitrary axis using Rodrigues' formula (I cos d + (1-cos d) u⊗u + sin d [u]x), used by the set_dihedral replacement in engine.py.
Adds a positional-ID check before calling order_atoms(): if every atom in mol already carries the same .id as the corresponding atom in ref_mol (and none are -1), the mol is already correctly ordered and the VF2 call is skipped. Reduces initialization time for species whose mol_list is returned pre-ordered (e.g. C8H18S: >10 s -> 0.01 s). Also changes order_mol_by_atom_map to call reordered.update(sort_atoms=False) so the explicit atom ordering set by the caller is not silently undone by the update step.
Adds skip_conformers: bool = False to both scissors() and _scissors(). When True, the generate_conformers() calls inside _scissors() are skipped, letting mapping-layer callers obtain the cut species structure without the overhead of conformer generation.
fix_dihedrals_by_backbone_mapping() previously corrected each torsion via set_dihedral(), which performs a full xyz->zmat->modify->zmat->xyz round- trip: O(N^2) per torsion and O(N^3) overall. Replaces this with a direct Cartesian rotation (Rodrigues' formula): 1. BFS from the far-side pivot to collect downstream atom indices (_downstream_atoms_adj, O(N) using a pre-built adjacency tuple). 2. Rotate only those atoms in-place: R(u,d) = I cos d + (1-cos d) u⊗u + sin d [u]x (apply_rodrigues_rotation in vectors.py). No zmat round-trip: O(N) per torsion, O(N^2) overall. Also caps identify_superimposable_candidates at _MAX_BACKBONE_CANDIDATES=4 and deduplicates inline, removing the prune_identical_dicts post-pass.
…and mol_list limit Three targeted optimizations that together reduce timeouts at 60 s from 8 to 2 across 452 benchmark reactions (succeeded: 286 -> 292): 1. Adjacency-list CPI cache in determine_possible_reaction_products_from_family: check_product_isomorphism is now cached by tuple(m.to_adjacency_list()) of the template molecules. Many product_lists from generate_products share the same graph; the result is computed once per unique graph and reused. For entry_269 (C10H9<->C10H8+H): 2400 CPI calls -> 150 per family. 2. Shared fwd_iso_cache / rev_iso_cache across all family calls in get_reaction_family_products: when rmg_family_set='all' lists 112 families with duplicates, the second occurrence of each family hits entirely from cache. For entry_244 (C16H11): CPI calls 794->406, time 19.55->9.47 s. flip_reaction() is also moved before the loop (called once, not per-family) and the pre-build is wrapped in try/except so that a ValueError from to_smiles() (e.g. mocked openbabel in tests) still skips cleanly. 3. mol_list search limit in generate_unimolecular_products: for molecules with >20 atoms and >2 resonance structures, only the first 2 Kekule forms are searched. Additional forms produce near-duplicate product graphs already deduplicated by the CPI cache; their VF2 cost (3-4 s for a 27-atom polycyclic per family) is eliminated. For entry_244 (C16H11, 7 resonance structures): generate_products for Intra_R_Add_Exocyclic drops from ~3 s to ~0.9 s per call.
Adds a .gitignore exception for arc/species/_zmat_c_kernels.{c,so} so
that the hand-written C geometry kernel and its compiled shared library
are versioned alongside the Python loader (_zmat_kernels.py). All other
*.so and *.c files remain ignored.
Extends the existing compile target with a gcc invocation that rebuilds arc/species/_zmat_c_kernels.so from its C source using the same optimization flags recorded in the source header (-O3 -march=native -ffast-math). The Cython arc.molecule build runs first, unchanged.
| from arc.species.converter import compare_confs, sort_xyz_using_indices, xyz_from_data | ||
| from arc.species.vectors import calculate_dihedral_angle, get_delta_angle | ||
| from arc.species.vectors import apply_rodrigues_rotation, calculate_dihedral_angle, get_delta_angle | ||
| from arc.species.zmat import get_all_neighbors |
| """ | ||
|
|
||
| import ctypes | ||
| import math |
|
|
||
| lib = _lib | ||
| available = True | ||
| except Exception: |
| break | ||
| except (IndexError, KeyError): | ||
| continue | ||
| except (IndexError, KeyError): |
| zmat['map'][d_atoms[2]], | ||
| zmat['map'][i]) < _NERF_CROSS_SQ_MIN: | ||
| continue | ||
| except (IndexError, KeyError): |
| zmat['map'][d_atoms[2]], | ||
| zmat['map'][i]) < _NERF_CROSS_SQ_MIN: | ||
| continue | ||
| except (IndexError, KeyError, TypeError): |
- Add zmat_a_f32 fast path in calculate_angle (same float32 kernel already used for distance and dihedral); degrades gracefully when .so absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Float32 kernel has ~0.001 deg error vs numpy float64 path; places=4/5 is too tight. places=2 (0.005 deg tolerance) is more than sufficient for all chemistry use cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Atom Mapping Performance Improvements —
optimize_ambranchOverview
Reduce runtime of the atom mapping procedure, utilzing algorithmic and programmatic solutions.
Change 1 —
arc/species/zmat.py: Collinear triplet guard indetermine_d_atoms()Problem
zmat_nerf implements the SN-NeRF atom placement algorithm: given three already-placed reference atoms A, B, C, it places a new atom D at bond length r from C, bond angle θ(B,C,D), and dihedral φ(A,B,C,D). This is the inner loop of _add_nth_atom_to_coords in zmat.py. However, this doesn't work if co-linearity exist.
determine_d_atoms()could select a reference atom triplet (B, C, D) that is nearlycollinear.
_add_nth_atom_to_coords()later computescross(B−A, normalize(C−B))forNeRF placement; a near-zero cross product produces NaN coordinates and causes mapping
failures on linear-geometry intermediates.
Fix
Added
_nerf_cross_sq()— a double-precision function computing|cross(B−A, normalize(C−B))|². A threshold_NERF_CROSS_SQ_MIN = 0.01(≈3.8° from linear at 1.5 Å bond) gates triplet acceptance:
The same guard is applied to the
[n, 2, 1, 0]fallback path, with a scanfor a valid 4th reference atom when the fallback itself is collinear.
Change 2 —
arc/species/_zmat_kernels.py+arc/species/vectors.py: C kernel wiringOne key bottleneck of atom mapping comes from vector computations:
The cost decomposition for C₈H₁₈ species to species mapping (cProfile, 0.770 s total, 1,284,756 calls)
fix_dihedrals_by_backbone_mappingset_dihedralxyz_to_zmat_add_nth_atom_to_zmatcalculate_paramnp.crossget_angleget_dihedralzmat_to_coordsnumpy.moveaxisorder_atoms(inside copy)numpy.asarrayKey optimizations of the vector coputations imperative for fast atom mapping. This is achieved by compiling kernels in C for these types of computations, and calling them via the
_zmat_kernels.py(new file)ctypes loader for the pre-compiled
_zmat_c_kernels.so. Exposes:zmat_r_f32— interatomic distancezmat_a_f32— bond anglezmat_d_f32— dihedral anglezmat_nerf— NeRF atom placement (SN-NeRF formula)Sets
available = Trueonly when the shared library loads without error; all callersgate on this flag so the code degrades gracefully when the
.sois absent, to avoid compiling errors.vectors.pyFast paths in
calculate_distanceandcalculate_dihedral_angle:calculate_anglewas left as numpy — the C kernel uses float32 which caused aprecision failure in
test_get_zmat_param_valueat the ~0.001° level. This can be debated offline.Also added
apply_rodrigues_rotation()used by Change 3. (source: rotation)Change 3 —
arc/mapping/engine.py: Rodrigues rotation replacesset_dihedral()round-tripProblem
fix_dihedrals_by_backbone_mapping()corrected each torsion angle by callingset_dihedral(), which performs a fullxyz → zmat → modify → zmat → xyzround-trip.This is O(N²) per torsion and O(N³) overall for large molecules.
Fix
Direct Cartesian rotation via Rodrigues' formula:
_downstream_atoms_adj).R(u,δ) = I cosδ + (1−cosδ)u⊗u + sinδ[u]×Additional changes in this file:
identify_superimposable_candidates: capped at_MAX_BACKBONE_CANDIDATES = 4(was unbounded) and deduplicates inline, removing the
prune_identical_dictspost-pass. No apparent performence loss.Change 4 —
arc/species/converter.py: Fast path inorder_atoms_in_mol_list()Problem
For species with pre-ordered mol_list,
order_atoms_in_mol_listran VF2for every resonance structure even when atom IDs already matched positionally.
Fix
Also changed
order_mol_by_atom_mapto usereordered.update(sort_atoms=False),preventing implicit reordering after explicit atom positioning.
Change 5 —
arc/family/family.py: Three optimizations in the family-matching pipeline5a. Adjacency-list CPI cache in
determine_possible_reaction_products_from_familycheck_product_isomorphism(CPI) was called once perproduct_listfromgenerate_products. For large families (e.g. R_Addition on C10H8+H generates2400 product_lists), most entries are graph-duplicates — same adjacency structure,
different atom-label mappings. The isomorphism result depends only on the graph,
so it is cached by adjacency-list key:
For reaction C10H9 ↔ C10H8+H: 2400 CPI calls → 150 unique per family (16× reduction).
5b. Shared
fwd_iso_cache/rev_iso_cacheacross family callsWhen
rmg_family_set='all'is requested, 112 families are iterated, many duplicated(same label appears as both RMG and ARC copy). With a shared cache, the second occurrence
of any family hits entirely from cache (0 CPI evaluations).
For C16H11 isomerization: CPI calls 794 → 406, time 19.55s → 9.47s.
flip_reaction()is now called once before the loop instead of once per family.The pre-build is wrapped in
try/exceptto preserve the original graceful-skipbehaviour when
to_smiles()fails (e.g. openbabel mocked in tests).5c. mol_list search limit for large molecules
For molecules with more than 2 resonance structures AND more than 20 atoms,
generate_unimolecular_productslimits the search to the first 2 Kekulé forms:Additional resonance structures produce near-duplicate product graphs already
deduplicated by the adjacency-list cache at the CPI level. The VF2 cost of
generating them (3–4 s for a 27-atom polycyclic per family) is eliminated.
For C16H11, 27 atoms, 7 resonance structures:
generate_productsfor
Intra_R_Add_Exocyclicdrops from ~3s to ~0.9s per call.Change 6 —
arc/species/species.py:skip_conformersonscissors()/_scissors()Added
skip_conformers: bool = Falseparameter propagated to thegenerate_conformerscalls inside
_scissors(). Allows mapping-layer callers to get the cut speciesstructure without triggering conformer generation.