Make Regression Tests Pass by Default#2983
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
|
|
When computing thermochemistry for polycyclic species, cycle sets were converted to lists using list(cycle_set). Python sets have non-deterministic iteration order due to hash randomization, causing different ring ordering across runs and inconsistent thermochemistry estimates. Fix: replace list(cycle_set) with sorted(cycle_set) in three locations: - Molecule.get_disparate_cycles() in molecule.py - Molecule.get_polycycles() in molecule.py - ThermoGroupAV.combine_cycles() in thermo.py Verification: test_polycyclic_nondeterminism.py reproduces and verifies deterministic behavior across multiple runs.
Multiple sources of non-determinism caused different edge models across consecutive runs, even with identical inputs: 1. VF2 vertex sorting in Graph.sort_vertices() used only connectivity values. Atoms with identical connectivity (e.g., equivalent ring carbons) had non-deterministic ordering. Added chemical tiebreaker (sorting_key attribute from Atom) to ensure deterministic sorting. This fix is inherited by all Graph subclasses (Molecule, Group, etc.). 2. descend_tree() in base.py matched multiple overlapping tree children (e.g., Nitro vs Nitrites) and picked next_node[0] from unsorted list, which had non-deterministic order due to hash randomization. Sorting by label was attempted but failed (alphabetical order picks Nitrites before Nitro, giving wrong match). Fixed by relying on deterministic tree construction order. 3. Species network processing in main.py used sets for species and objects_to_enlarge, causing non-deterministic iteration order. Sorted by label for deterministic processing. Verification: liquid_oxidation reproducibility check passes 5 consecutive runs with identical core/edge models (217 species, 1601 reactions).
Reaction.is_isomorphic() compared specific_collider using ==, which delegates to Species.__eq__() - an identity-only comparison (self is other). When two models are loaded from separate Chemkin files, the specific_collider (N2) species are distinct object instances despite being chemically identical, causing == to always return False. Fix: replace == with structural is_isomorphic() comparison for specific_collider. Verification: sulfur edge comparison passes 5 consecutive runs with 227 matched reactions and 0 unique reactions.
Two related fixes for deterministic vertex and tree node ordering: 1. Graph.sort_vertices(): moved the chemical tiebreaker (sorting_key attribute from Atom) from Molecule.sort_vertices() override into the parent Graph.sort_vertices(). All subclasses (Molecule, Group, etc.) now inherit deterministic sorting when connectivity values are identical. Vertex index serves as a final tiebreaker to avoid comparing vertex objects directly (GroupAtom has no __lt__). 2. Database.descend_tree(): removed alphabetical sorting of matching children. When multiple children match (e.g., Nitro vs Nitrites), sorting by label picks Nitrites (alphabetically first) instead of Nitro (tree order first, which is the intended match). The tree is constructed deterministically, so picking next_node[0] in tree order is both correct and deterministic.
f609e81 to
96cc1eb
Compare
The objects_to_enlarge list can contain tuples (PDepNetwork, Species) from process_pdep_networks, which don't have a .label attribute. The previous sort used a lambda that assumed all objects had .label, causing an AttributeError for superminimal regression test. Fix: use a sort key function that handles both regular objects (Species, Network) and tuples (PDepNetwork, Species) with appropriate label extraction.
Regression Testing Results
Detailed regression test results.Regression test aromatics:Reference: Execution time (DD:HH:MM:SS): 00:00:00:52 aromatics Passed Core Comparison ✅Original model has 15 species. aromatics Failed Edge Comparison ❌Original model has 106 species. Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)CsCsH) + group(Cs-(Cds-Cds)CsCsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)CsHH) + group(Cds-CdsCsCs) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + polycyclic(s2_4_5_diene_1_5) + polycyclic(s3_4_5_ene_3) + polycyclic(s3_5_5_ene_1) - ring(Cyclobutene) - ring(Cyclopentene) - ring(Cyclopentane) + radical(cyclopentene-allyl) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)CsCsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-(Cds-Cds)(Cds-Cds)CsH) + group(Cs-CsCsHH) + group(Cds-CdsCsCs) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + Estimated bicyclic component: polycyclic(s2_3_5_ane) - ring(Cyclopentane) - ring(Cyclopropane) + ring(Cyclopentene) + ring(Cyclopropane) + polycyclic(s2_3_6_ene_1) + polycyclic(s3_5_6_diene_1_5) - ring(Cyclopropane) - ring(Cyclopentene) - ring(Cyclohexene) + radical(cyclopentene-4) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)CsCsH) + group(Cs-(Cds-Cds)CsCsH) + group(Cs-(Cds-Cds)CsCsH) + group(Cs-CsCsHH) + group(Cds- Cds(Cds-Cds)Cs) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + polycyclic(s2_4_5_ene_1) + polycyclic(s3_4_5_ene_1) + polycyclic(s3_5_5_diene_0_4) - ring(Cyclobutane) - ring(Cyclopentene) - ring(Cyclopentene) + radical(bicyclo[2.1.1]hex-2-ene-C5) Non-identical thermo! ❌
thermo: Thermo group additivity estimation: group(Cs-(Cds-Cds)(Cds-Cds)(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)(Cds-Cds)) + group(Cds-CdsCsH) + group(Cds-CdsCsH) + group(Cds-Cds(Cds-Cds)H) + group(Cds-Cds(Cds-Cds)H) + group(Cds-CdsCsH) + group(Cdd-CdsCds) + Estimated bicyclic component: polycyclic(s4_6_6_ane) - ring(Cyclohexane) - ring(Cyclohexane) + ring(124cyclohexatriene) + ring(1,4-Cyclohexadiene) Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Non-identical kinetics! ❌
kinetics: Errors occurred during edge comparison
ERROR conda.cli.main_run:execute(148): `conda run python scripts/checkModels.py aromatics-edge stable_regression_results/aromatics/chemkin/chem_edge_annotated.inp stable_regression_results/aromatics/chemkin/species_edge_dictionary.txt test/regression/aromatics/chemkin/chem_edge_annotated.inp test/regression/aromatics/chemkin/species_edge_dictionary.txt` failed. (See above for error)
|
|
@rwest I've taken a stab at getting the regression tests to pass by default, with the help of AI this go around. I think this actually fixes it, to boot. Please take a look when you have some time. |
This PR is two things: (1) an important effort towards fixing reproducibility issues that crop up in RMG and its regression tests and (2) an experiment for me in using AI to fix bugs.
All of the code changes in this PR were made by Qwen 3.6 27B in opencode. I gave the model a script that ran a given regression test twice and then used
scripts/checkModel.pyon the output to ensure reproducibility, then asked the model to fix any non-reproducible regression tests.Description
Our regression tests 'fail' by default - in reality, RMG just doesn't always make consistent estimates of thermochemistry (and consequently kinetics). This PR resolves it.
sulfurThe sulfur regression test typically fails with:
Obviously this is not actually a failure, we just aren't checking equality correctly. This was resolved in 8b52c36 by checking for isomorphism rather than equality (there may be a better way to do this, read the extended description of that commit to understand why it was done this way).
aromaticsThe test typically fails with:
which is due to a non-deterministic choice in ring decomposition. PR #2920 partially resolved this, and the remaining issue was fixed here: 9156e75 - all that was required was changing
listtosorted(sortedreturns a list) to ensure that lists of component cycles are always in the same order.liquid_oxidationandRMS_CSTR_liquid_oxidationBoth of these often failed, typically with missing reactions and species in the edge (and sometimes core, too):
This was resolved in the remaining commits, which are a bit messy. From what I understand it is also related to sorting, but now with regard to the order in which species are added to the model. Would appreciate some further insight on this one.