Skip to content

Multivariate analysis rework, dendrogram/sample-correlation QC tools, and cleanup#8

Merged
robertsamples merged 18 commits into
mainfrom
todo-cleanup
Jun 30, 2026
Merged

Multivariate analysis rework, dendrogram/sample-correlation QC tools, and cleanup#8
robertsamples merged 18 commits into
mainfrom
todo-cleanup

Conversation

@robertsamples

Copy link
Copy Markdown
Owner

This branch bundles several related improvements that grew out of working through the main.py TODO list.

##Multivariate Analysis tab rework

  • The tab previously labeled "PCA" actually only ever ran NMDS. It now genuinely supports PCA, NMDS, and PLS-DA, selectable from a dropdown, with a Scores/Loadings view toggle.
  • PCA/PLS-DA now properly scale features before fitting — previously a handful of high-abundance compounds were dominating the results.
  • "Collapse Technical Replicates" actually works now (it was silently disabled before).

##Dendrogram quality-control coloring

  • Branches are now colored to show clustering quality: green where replicates/groups cluster cleanly, magenta at the exact point where two groups' replicates overlap (chosen over the more conventional red so the colors stay distinguishable for colorblind users).
  • Added controls for view (technical vs. biological replicates), turning the coloring on/off, bootstrap support on/off, and a "Use Sample/Group Names" option for more readable labels.
  • Fixed an alignment bug where the bootstrap confidence labels (AU/BP) would overlap and become illegible on larger datasets.

##Sample Correlation Matrix

  • Added a Method selector (Spearman, Jaccard, or Bray-Curtis) and a View selector (individual injections, biological replicates, or biological groups), plus the same "Use Sample/Group Names" labeling option as the dendrogram.
  • Fixed the heatmap shrinking on repeated redraws and corrected the color scale.

##UpSet plot and treemap

  • These now render directly as live plots instead of round-tripping through a saved image file, so they get the same zoom/pan/save controls as the other plots.

##Other cleanup

  • Closed out several main.py TODO items: a check that warns users to run an analysis before using search, removed unused imports, clarified stale notes, and groundwork for reordering plot groups.
  • Reorganized the documentation site: removed a redundant contributor-only page (contributor notes now live in devnotes.md in the repo), and brought plot documentation up to date with all of the above.

Compatibility: No changes to the saved .mpct file format; old save files still load correctly.

Testing: All 159 headless unit tests pass; GUI changes verified manually against example data.

robertsamples and others added 18 commits June 29, 2026 01:39
- goto_search now tells the user to run an analysis first instead of
  silently doing nothing when the search tab is opened before
  self.analysisrun is set, closing the "add runcheck before searching"
  TODO.
- Removed main.py's unused PyQt5/stdlib/groupsets imports (platform,
  GroupSet, several never-referenced Qt classes), verified via pyflakes
  + grep cross-check; no behavior change, 130 existing tests still pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Lays the Qt-free foundation for the "groups can be reordered" TODO:
GroupSetModel.move(from_index, to_index) reorders groupsets and keeps
the selection on the moved/shifted item by identity (not GroupSet's
value-based __eq__, since two freshly-added default groupsets compare
equal). 8 new tests in test_groupsets.py.

UI wiring (drag-and-drop on listWidget_pltgrps) intentionally left for
later -- it would need to be verified against a live GUI session to
confirm it interacts correctly with updatesets()'s existing
blockSignals dance, which isn't something to ship unverified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PCA itself already exists (plot_PCA/goto_pca/checkbox field) -- the
remaining gap is specifically loadings/biplot visualization of which
features drive each component, which plot_PCA doesn't do yet. Reworded
so the TODO doesn't read as if PCA support is still missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New code/ordination.py: PCA, NMDS (the metric-MDS-warm-started
non-metric MDS already used by the soon-to-be-renamed plot_PCA, kept
verbatim), and PLS-DA, plus a Qt-free port of the data-loading/
technical-replicate-collapsing logic the plot currently has hardcoded
off (plotting.py: `parent.collapsereps = False#...`).

The collapse-replicate logic is a near-verbatim port of the original,
not a rewrite -- its header-juggling (round-tripping through a CSV to
relabel an unstack() result) is easy to get subtly wrong by inspection
alone, so it's preserved as-is and verified empirically instead:
test_ordination.py constructs a synthetic peak table with 3 samples
across 2 biological groups, 3 technical-replicate injections each, and
asserts collapsing lands on exactly 3 rows (one per Sample) -- not 9
(uncollapsed) and not 2 (would mean biological replicates got merged
too). Cross-checked against real example data with a scratch script
(27 injections / 9 samples / 3 groups -> collapses to 9, not 3).

Also caught and fixed a real bug while validating against real data:
PLSRegression's default scale=True standardizes X internally, so the
original explained-variance-ratio calc (component score variance /
unscaled total variance) silently produced ratios around 1e-6 instead
of the ~0.7 a well-separated dataset should show. Fixed with
scale=False, matching PCA's plain-centered treatment.

OPLS-DA is intentionally not implemented (see ordination.py's module
docstring) -- no scikit-learn support, and the alternatives (an
unmaintained third-party package, or a from-scratch implementation
with no reference dataset to validate against) are both riskier than
shipping PCA/NMDS/PLS-DA now and revisiting OPLS-DA later.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
plot_PCA only ever ran NMDS (with a PCA rotation applied afterward
purely to orient axes) -- renamed to plot_ordination and reworked to
genuinely support PCA, NMDS, and PLS-DA, selectable via a combo-box
switcher bar inserted above the plot canvas (same runtime
widget-substitution pattern as searchtree.py's filter bar), plus a
Scores/Loadings view toggle. The math moved to the new Qt-free
ordination.py (previous commit); this is the Qt plumbing on top of it.

- Axis labels now show percent-variance-explained where meaningful
  (PCA/PLS-DA: real feature-space variance; NMDS: labeled distinctly
  as embedding variance, since it isn't the same quantity).
- Loadings view shows the top-25 features by vector magnitude as
  origin-anchored arrows (thousands of features can't all be drawn
  legibly) -- but whichever feature is currently highlighted elsewhere
  in the app is always included regardless of magnitude, via a second
  pre-created highlight artist (plot_ordination.highlight_loading(),
  called from MainWindow._refresh_highlight()) following the same
  convention every other plot's highlight marker already uses.
- Restored "Collapse Technical Replicates": plotting.py previously had
  this hardcoded off (`parent.collapsereps = False#...isChecked()`);
  now reads the real checkbox via ordination.load_ordination_matrix().
- checkBox_pca's visible text/btn_pca's tooltip changed from "PCA" to
  "Multivariate" -- the underlying objectName/analysis_params.PCA
  attribute are unchanged for .mpct save-file compatibility.
- Verified the view/method-switching lifecycle (Scores<->Loadings,
  method changes, the highlight-on-demand path for a feature outside
  the default top-25) against real example data with an offscreen Qt
  harness before considering this done -- in particular confirmed
  ui_plot.reset()'s mpl_disconnect(self.event) doesn't error when
  switching away from Scores view (where the pick-event connection
  lives) and back.

devnotes.md documents all of the above plus the OPLS-DA deferral
(unmaintained pyopls package, or a from-scratch implementation with no
reference dataset to validate against -- both riskier than shipping
PCA/NMDS/PLS-DA now).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tyling

All four issues caught only by checking against real data / the live
GUI, not by inspection:

- PCA/PLS-DA now autoscale features (mean-center + unit-variance) before
  fitting. Raw mass-spec intensities span a huge range across features
  (confirmed: feature std devs from ~1.8 to ~10,000 on real example
  data) -- without scaling, a handful of high-abundance features
  dominated both explained-variance and loadings, which is why loadings
  were showing up "in the thousands" while most were tiny, and why
  %explained looked unusually high. NMDS is deliberately left
  unscaled (Bray-Curtis dissimilarity is conventionally computed on
  raw/relative abundances).
- NMDS axis labels no longer show percent-explained at all -- it's a
  rank-based embedding, not a linear decomposition, so it doesn't
  canonically have that quantity the way PCA/PLS-DA do. Shows stress
  (the conventional NMDS fit-quality metric) as the plot title instead.
- Loadings-view axis limits are now set explicitly from the actually-
  plotted data: ax.annotate()'s arrows don't reliably drive matplotlib's
  autoscale the way ax.scatter()/ax.plot() do (confirmed empirically --
  plotted points could fall outside the auto-picked view), which is
  what required manually rescaling each axis before. Also fixed
  top_loadings() being called against the full (up to 10-component)
  loadings instead of just the 2 displayed ones, which could let an
  irrelevant-to-this-view feature crowd out a genuinely prominent one.
- Switcher bar: capped to a fixed max height so it doesn't eat canvas
  space, and restyled for page_pca's light background (white combo
  boxes, dark text) instead of searchtree.py's dark-theme styling,
  which was the wrong context here.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New clusterpurity.py colors dendrogram branches green wherever a whole
group's leaves merge together before meeting any other group, plus a
Technical/Biological Replicates switcher on the dendrogram tab (mirrors
plot_ordination's method/view bar) and a plot-title purity summary
(n_pure/n_total). Applies to both the regular and bootstrap (PvClust)
dendrogram paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Purity coloring now uses red (not black) for branches that mix more
than one group, and a new "Color: None" mode reproduces the tab's
pre-purity-coloring appearance exactly (plain black, no title) --
fixes a regression where dropping color_threshold=0 made "None" fall
back to scipy's default multi-color palette instead of plain black.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gen_treemap/gen_upsetplt used to savefig() a PNG to the repo root and
load it into a QLabel via QPixmap -- no zoom/pan/save toolbar, and a
flat raster rewritten on every run. plot_treemap/plot_upset now draw
directly onto a persistent FigureCanvas, wired into _generate_plots()
via the same _create_or_reset pattern every other plot uses, so they
regenerate on both a fresh analysis and the Apply button (previously
only on a fresh analysis).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xes to per-plot bars

- clusterpurity.purity_link_color_func now distinguishes pure (green),
  bridge (red -- the specific merge where a new group first meets an
  existing one), and neutral (black -- combining two already-impure
  clades, no new information). Previously every ancestor of a single
  mixing event also rendered red, painting most of the tree's upper
  structure red regardless of how localized the actual mixing was.

- "Bootstrap Analysis" and "Collapse Technical Replicates" moved off
  the global plot-config dialog (where each only ever affected one
  plot) onto that plot's own switcher bar: plot_dendrogram gets a
  "Bootstrap" checkbox, plot_ordination gets a "Collapse Replicates"
  checkbox. The now-orphaned dialog widgets are hidden at runtime
  (not edited out of the generated ui_plotparam.py); 'bootstrap' is
  dropped from paramfields.CHECKBOX_FIELDS since it's no longer
  pickled, consistent with the dendrogram/ordination tabs' other
  per-session-only view state.

- Delete code/treemap.png and code/test_upsetplt.png: dead tracked
  files left over from before the canvas-based rendering change --
  nothing reads or writes them anymore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous "bridge vs neutral" heuristic still mis-colored real
data: it could mark a high-level merge red just because one side was
a single freshly-introduced pure clade, and could miss genuine
tangles where two already-impure children share a label without
either side being trivially pure.

purity_link_color_func now classifies each merge by comparing the two
children's label sets directly: disjoint sets (no label in common) ->
neutral/black, a clean join even if one side is impure from an
unrelated tangle further down; overlapping sets -> red, definitive
proof some label's leaves are split across this exact merge. Verified
against the real dataset's bootstrap dendrogram: only the actual
scattered-replicate merges render red, and every higher-level merge
joining that region with cleanly-resolved samples stays black.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ordination.replicate_label_components() numbers each Injection's
  biological and technical replicate rank (BioRep#/TechRep#) within
  its Biolgroup/Sample, unconditionally (works fine when either count
  is 1). A new "Use Sample/Group Names" checkbox in the dendrogram's
  switcher bar swaps the raw file/injection names for
  <Biolgroup>_b<BioRep#>_s<TechRep#> (or _b<BioRep#> alone in the
  Biological Replicates view) -- useful when the real file names are
  long or uninformative.

- pvclust.plot_dendrogram's AU/BP annotations used a fixed icoord-unit
  x-shift that shrank to an ever-smaller pixel gap as leaf count grew
  (icoord-to-pixel ratio shrinks with more leaves in the same plot
  width), eventually merging "AU"/"BP" into illegible overlapping
  text. Fixed with ax.annotate(..., textcoords='offset points',
  ha='right'/'left'), which keeps a constant pixel gap regardless of
  leaf count, plus leaf-count-scaled fontsize. Also removed a
  plt.figure()/plt.tight_layout() pair that created and abandoned an
  unused Figure on every redraw.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ements

- multivariate.md: rewrite from NMDS-only to cover the full PCA/NMDS/PLS-DA
  ordination tab — method switcher, scores/loadings view, collapse-replicates
  checkbox, stress metric for NMDS, %explained for PCA/PLS-DA.
- group-analysis.md: document all four dendrogram switcher-bar controls
  (View, Color, Bootstrap, Use Sample/Group Names).
- changelog.md: add 2026 entries for ordination rework, dendrogram purity
  coloring/switchers/label options, AU/BP annotation fix, and canvas-plot
  UpSet/treemap.
- development.md: add ordination.py, clusterpurity.py, csvcache.py to the
  hand-written-code list; add ordination to the test-coverage list.
- index.md: expand the "mid-2026 updates" note to mention ordination and
  dendrogram reworks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
sklearn not added to tests, caused ci build test failure
@robertsamples robertsamples merged commit fa09d64 into main Jun 30, 2026
7 checks passed
robertsamples added a commit that referenced this pull request Jun 30, 2026
… and cleanup (#8)

* Reword and reprioritize main.py TODO list

* Add search-tab run-check, clean up main.py's dead imports

- goto_search now tells the user to run an analysis first instead of
  silently doing nothing when the search tab is opened before
  self.analysisrun is set, closing the "add runcheck before searching"
  TODO.
- Removed main.py's unused PyQt5/stdlib/groupsets imports (platform,
  GroupSet, several never-referenced Qt classes), verified via pyflakes
  + grep cross-check; no behavior change, 130 existing tests still pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add GroupSetModel.move() for groupset reordering (model layer only)

Lays the Qt-free foundation for the "groups can be reordered" TODO:
GroupSetModel.move(from_index, to_index) reorders groupsets and keeps
the selection on the moved/shifted item by identity (not GroupSet's
value-based __eq__, since two freshly-added default groupsets compare
equal). 8 new tests in test_groupsets.py.

UI wiring (drag-and-drop on listWidget_pltgrps) intentionally left for
later -- it would need to be verified against a live GUI session to
confirm it interacts correctly with updatesets()'s existing
blockSignals dance, which isn't something to ship unverified.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Clarify stale PCA TODO note

PCA itself already exists (plot_PCA/goto_pca/checkbox field) -- the
remaining gap is specifically loadings/biplot visualization of which
features drive each component, which plot_PCA doesn't do yet. Reworded
so the TODO doesn't read as if PCA support is still missing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add Qt-free multivariate ordination backend (PCA/NMDS/PLS-DA)

New code/ordination.py: PCA, NMDS (the metric-MDS-warm-started
non-metric MDS already used by the soon-to-be-renamed plot_PCA, kept
verbatim), and PLS-DA, plus a Qt-free port of the data-loading/
technical-replicate-collapsing logic the plot currently has hardcoded
off (plotting.py: `parent.collapsereps = False#...`).

The collapse-replicate logic is a near-verbatim port of the original,
not a rewrite -- its header-juggling (round-tripping through a CSV to
relabel an unstack() result) is easy to get subtly wrong by inspection
alone, so it's preserved as-is and verified empirically instead:
test_ordination.py constructs a synthetic peak table with 3 samples
across 2 biological groups, 3 technical-replicate injections each, and
asserts collapsing lands on exactly 3 rows (one per Sample) -- not 9
(uncollapsed) and not 2 (would mean biological replicates got merged
too). Cross-checked against real example data with a scratch script
(27 injections / 9 samples / 3 groups -> collapses to 9, not 3).

Also caught and fixed a real bug while validating against real data:
PLSRegression's default scale=True standardizes X internally, so the
original explained-variance-ratio calc (component score variance /
unscaled total variance) silently produced ratios around 1e-6 instead
of the ~0.7 a well-separated dataset should show. Fixed with
scale=False, matching PCA's plain-centered treatment.

OPLS-DA is intentionally not implemented (see ordination.py's module
docstring) -- no scikit-learn support, and the alternatives (an
unmaintained third-party package, or a from-scratch implementation
with no reference dataset to validate against) are both riskier than
shipping PCA/NMDS/PLS-DA now and revisiting OPLS-DA later.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Rework the mislabeled "PCA" plot into a multivariate ordination tab

plot_PCA only ever ran NMDS (with a PCA rotation applied afterward
purely to orient axes) -- renamed to plot_ordination and reworked to
genuinely support PCA, NMDS, and PLS-DA, selectable via a combo-box
switcher bar inserted above the plot canvas (same runtime
widget-substitution pattern as searchtree.py's filter bar), plus a
Scores/Loadings view toggle. The math moved to the new Qt-free
ordination.py (previous commit); this is the Qt plumbing on top of it.

- Axis labels now show percent-variance-explained where meaningful
  (PCA/PLS-DA: real feature-space variance; NMDS: labeled distinctly
  as embedding variance, since it isn't the same quantity).
- Loadings view shows the top-25 features by vector magnitude as
  origin-anchored arrows (thousands of features can't all be drawn
  legibly) -- but whichever feature is currently highlighted elsewhere
  in the app is always included regardless of magnitude, via a second
  pre-created highlight artist (plot_ordination.highlight_loading(),
  called from MainWindow._refresh_highlight()) following the same
  convention every other plot's highlight marker already uses.
- Restored "Collapse Technical Replicates": plotting.py previously had
  this hardcoded off (`parent.collapsereps = False#...isChecked()`);
  now reads the real checkbox via ordination.load_ordination_matrix().
- checkBox_pca's visible text/btn_pca's tooltip changed from "PCA" to
  "Multivariate" -- the underlying objectName/analysis_params.PCA
  attribute are unchanged for .mpct save-file compatibility.
- Verified the view/method-switching lifecycle (Scores<->Loadings,
  method changes, the highlight-on-demand path for a feature outside
  the default top-25) against real example data with an offscreen Qt
  harness before considering this done -- in particular confirmed
  ui_plot.reset()'s mpl_disconnect(self.event) doesn't error when
  switching away from Scores view (where the pick-event connection
  lives) and back.

devnotes.md documents all of the above plus the OPLS-DA deferral
(unmaintained pyopls package, or a from-scratch implementation with no
reference dataset to validate against -- both riskier than shipping
PCA/NMDS/PLS-DA now).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix ordination feedback: scaling, axis limits, NMDS %explained, bar styling

All four issues caught only by checking against real data / the live
GUI, not by inspection:

- PCA/PLS-DA now autoscale features (mean-center + unit-variance) before
  fitting. Raw mass-spec intensities span a huge range across features
  (confirmed: feature std devs from ~1.8 to ~10,000 on real example
  data) -- without scaling, a handful of high-abundance features
  dominated both explained-variance and loadings, which is why loadings
  were showing up "in the thousands" while most were tiny, and why
  %explained looked unusually high. NMDS is deliberately left
  unscaled (Bray-Curtis dissimilarity is conventionally computed on
  raw/relative abundances).
- NMDS axis labels no longer show percent-explained at all -- it's a
  rank-based embedding, not a linear decomposition, so it doesn't
  canonically have that quantity the way PCA/PLS-DA do. Shows stress
  (the conventional NMDS fit-quality metric) as the plot title instead.
- Loadings-view axis limits are now set explicitly from the actually-
  plotted data: ax.annotate()'s arrows don't reliably drive matplotlib's
  autoscale the way ax.scatter()/ax.plot() do (confirmed empirically --
  plotted points could fall outside the auto-picked view), which is
  what required manually rescaling each axis before. Also fixed
  top_loadings() being called against the full (up to 10-component)
  loadings instead of just the 2 displayed ones, which could let an
  irrelevant-to-this-view feature crowd out a genuinely prominent one.
- Switcher bar: capped to a fixed max height so it doesn't eat canvas
  space, and restyled for page_pca's light background (white combo
  boxes, dark text) instead of searchtree.py's dark-theme styling,
  which was the wrong context here.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add dendrogram purity coloring: technical/biological replicate QC view

New clusterpurity.py colors dendrogram branches green wherever a whole
group's leaves merge together before meeting any other group, plus a
Technical/Biological Replicates switcher on the dendrogram tab (mirrors
plot_ordination's method/view bar) and a plot-title purity summary
(n_pure/n_total). Applies to both the regular and bootstrap (PvClust)
dendrogram paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Dendrogram: polyphyletic branches in red, add a no-coloring option

Purity coloring now uses red (not black) for branches that mix more
than one group, and a new "Color: None" mode reproduces the tab's
pre-purity-coloring appearance exactly (plain black, no title) --
fixes a regression where dropping color_threshold=0 made "None" fall
back to scipy's default multi-color palette instead of plain black.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Replace treemap/upset PNG round-trip with real canvas plots

gen_treemap/gen_upsetplt used to savefig() a PNG to the repo root and
load it into a QLabel via QPixmap -- no zoom/pan/save toolbar, and a
flat raster rewritten on every run. plot_treemap/plot_upset now draw
directly onto a persistent FigureCanvas, wired into _generate_plots()
via the same _create_or_reset pattern every other plot uses, so they
regenerate on both a fresh analysis and the Apply button (previously
only on a fresh analysis).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Dendrogram: bridge-only red coloring; move bootstrap/collapse checkboxes to per-plot bars

- clusterpurity.purity_link_color_func now distinguishes pure (green),
  bridge (red -- the specific merge where a new group first meets an
  existing one), and neutral (black -- combining two already-impure
  clades, no new information). Previously every ancestor of a single
  mixing event also rendered red, painting most of the tree's upper
  structure red regardless of how localized the actual mixing was.

- "Bootstrap Analysis" and "Collapse Technical Replicates" moved off
  the global plot-config dialog (where each only ever affected one
  plot) onto that plot's own switcher bar: plot_dendrogram gets a
  "Bootstrap" checkbox, plot_ordination gets a "Collapse Replicates"
  checkbox. The now-orphaned dialog widgets are hidden at runtime
  (not edited out of the generated ui_plotparam.py); 'bootstrap' is
  dropped from paramfields.CHECKBOX_FIELDS since it's no longer
  pickled, consistent with the dendrogram/ordination tabs' other
  per-session-only view state.

- Delete code/treemap.png and code/test_upsetplt.png: dead tracked
  files left over from before the canvas-based rendering change --
  nothing reads or writes them anymore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix dendrogram coloring: red = proven non-monophyly (label-set overlap)

The previous "bridge vs neutral" heuristic still mis-colored real
data: it could mark a high-level merge red just because one side was
a single freshly-introduced pure clade, and could miss genuine
tangles where two already-impure children share a label without
either side being trivially pure.

purity_link_color_func now classifies each merge by comparing the two
children's label sets directly: disjoint sets (no label in common) ->
neutral/black, a clean join even if one side is impure from an
unrelated tangle further down; overlapping sets -> red, definitive
proof some label's leaves are split across this exact merge. Verified
against the real dataset's bootstrap dendrogram: only the actual
scattered-replicate merges render red, and every higher-level merge
joining that region with cleanly-resolved samples stays black.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Dendrogram: add Use Sample/Group Names labels; fix AU/BP label scaling

- ordination.replicate_label_components() numbers each Injection's
  biological and technical replicate rank (BioRep#/TechRep#) within
  its Biolgroup/Sample, unconditionally (works fine when either count
  is 1). A new "Use Sample/Group Names" checkbox in the dendrogram's
  switcher bar swaps the raw file/injection names for
  <Biolgroup>_b<BioRep#>_s<TechRep#> (or _b<BioRep#> alone in the
  Biological Replicates view) -- useful when the real file names are
  long or uninformative.

- pvclust.plot_dendrogram's AU/BP annotations used a fixed icoord-unit
  x-shift that shrank to an ever-smaller pixel gap as leaf count grew
  (icoord-to-pixel ratio shrinks with more leaves in the same plot
  width), eventually merging "AU"/"BP" into illegible overlapping
  text. Fixed with ax.annotate(..., textcoords='offset points',
  ha='right'/'left'), which keeps a constant pixel gap regardless of
  leaf count, plus leaf-count-scaled fontsize. Also removed a
  plt.figure()/plt.tight_layout() pair that created and abandoned an
  unused Figure on every redraw.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Docs: update mkdocs guide for ordination rework and dendrogram improvements

- multivariate.md: rewrite from NMDS-only to cover the full PCA/NMDS/PLS-DA
  ordination tab — method switcher, scores/loadings view, collapse-replicates
  checkbox, stress metric for NMDS, %explained for PCA/PLS-DA.
- group-analysis.md: document all four dendrogram switcher-bar controls
  (View, Color, Bootstrap, Use Sample/Group Names).
- changelog.md: add 2026 entries for ordination rework, dendrogram purity
  coloring/switchers/label options, AU/BP annotation fix, and canvas-plot
  UpSet/treemap.
- development.md: add ordination.py, clusterpurity.py, csvcache.py to the
  hand-written-code list; add ordination to the test-coverage list.
- index.md: expand the "mid-2026 updates" note to mention ordination and
  dendrogram reworks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* correlation matrix control improvements

* Update tests.yml

sklearn not added to tests, caused ci build test failure

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@robertsamples robertsamples deleted the todo-cleanup branch June 30, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant