SOLR-18284: Load-average and memory circuit breakers no longer stampede or trip on transient pre-GC heap peaks#4514
Conversation
gerlowskija
left a comment
There was a problem hiding this comment.
This generally looks pretty close IMO. I have two remaining big questions:
- Do we really need to write TtlSampledMetric ourselves, or is there something off the shelf (e.g. Caffeine?) that we could use instead?
- Why the MemoryCB change to start considering various memory "pools"? If this is a bug, it'd be great to that acknowledged and a sentence or two on it somewhere for posterity to understand this part of the change down the road...
Note that those are "just" questions though; they may not even require changes so much as clarification 🤷
Looks great Mark!
| * the supplier. The {@code refreshing} flag is always released, so a thrown sampler does not wedge | ||
| * the single-flight latch. | ||
| */ | ||
| final class TtlSampledMetric<T> { |
There was a problem hiding this comment.
[Q] The implementation here is great and I applaud your concurrency code. But I'm a little surprised that we'd have to implement this functionality ourselves and that there's nothing "off the shelf" that we could use instead. Surely we're not the first folks out there to calculate primitives with a TTL.
Did you consider (and discard) the idea of using Caffeine or something similar here instead? Or might that be worth considering?
There was a problem hiding this comment.
Yes, Caffeine can do this: refreshAfterWrite provides a single-flight refresh that serves the stale value to other callers while one thread refreshes, which is largely the behavior I want.
But this is one value, not a keyed map. Caffeine's refreshAfterWrite also refreshes async. It uses ForkJoinPool.commonPool() unless you provide your own executor. This changes the timing from "at most one sample per window" to "at most one in-flight refresh that completes whenever the pool runs it." A heap-pool walk or load-average syscall ends up on the shared pool, which I didn't like. The reasons are similar to my concerns about using a simple Timer, refreshing async, and writing to a shared volatile. This method instead gives me the exact behavior I want on a critical hot path. I also did not want to allocate cache entries or wrap a key just for a single primitive.
So I considered it and decided to stay with the small class. It is about 50 lines that we have exact control over, with no new dependencies that force our use case onto them on the hot path. No extra executors, no extra threads, no nondeterministic timing, no extra allocations.
There was a problem hiding this comment.
Yeah, it's small in that it's 50 LOC and no-dependencies. But even concise concurrency code is complex.
If you've considered alternatives then I'm sure you've thought about it more than I have and I trust your judgement there. So, SGTM. Just wanted to make sure the tradeoff was called out 👍
| * <p>Pool selection by collector: | ||
| * | ||
| * <ul> | ||
| * <li><b>G1 / Parallel / Serial / generational ZGC:</b> uses the pool whose name matches the |
There was a problem hiding this comment.
[Q] Can you provide a bit more context on this and the stuff at L75-L112 below? Specifically: what value does this differentiation serve? MemoryCircuitBreaker was agnostic of pool and GC-algorithm prior to this PR and there's nothing in SOLR-18284 that points to that being an issue.
Is this a fix for an additional bug that you found while working on this PR? If so, is that additional bug documented somewhere?
There was a problem hiding this comment.
Yes, this is the fix, not a side change. I will say that plainly in the JIRA and the class javadoc.
MemoryCircuitBreaker was agnostic to the pool and GC algorithms, but that was because it could do the wrong thing at times. The old breaker read MemoryMXBean.getHeapMemoryUsage().getUsed() That is whole-heap occupancy, including garbage that has not yet been collected. On a generational collector, that number climbs toward max between collections during normal operation. That is the steady-state shape of a healthy JVM, not heap pressure to trigger a circuit breaker. Using 99% of the heap is meaningless - that can be 1 second before a harmless GC drops it to 10%. The breaker tripped on pre-GC peaks that the next collection was about to reclaim.
getCollectionUsage() on the old-gen pool reports occupancy right after the last collection - live data, not garbage, not live data + garbage. That is the only reading that answers "Is the heap actually full?"
getCollectionUsage() is per-pool, so you have to pick a pool, and the one you want is the old/tenured generation where long-lived data is, and each collector names that differently:
- G1 →
G1 Old Gen - Parallel →
PS Old Gen - Serial →
Tenured Gen - generational ZGC →
ZGC Old Generation
Those match the \b(Old|Tenured)\b name pattern, so the circuit breaker reads getCollectionUsage() on that pool. Non-generational ZGC and Shenandoah are different, and some yet to be introduced collectors may be too - they have one combined heap pool and no old-gen name to match. When nothing matches the pattern, the fallback sums getCollectionUsage() across every HEAP-typed pool instead. So the name match plus fallback is there because the pool layout changes depending on the collector.
There was a problem hiding this comment.
Gotcha - appreciate the clarification. So the "correct" metric is only exposed at the level of individual pools; makes more sense, thanks!
…de or trip on transient pre-GC heap peaks Load-average and memory circuit breakers were doing more harm than good under high concurrency. LoadAverageCircuitBreaker called OperatingSystemMXBean.getSystemLoadAverage() on every request. The OS load average is a one-minute moving average, so re-polling it per-request is wasted work - and when many requests arrived concurrently, the syscall stampede hammered the CPU, which is the condition this breaker exists to prevent rather than cause. MemoryCircuitBreaker sampled MemoryMXBean.getHeapMemoryUsage().getUsed() on a 30-second moving average. With a generational collector that signal climbs toward max between collections during normal operation; the breaker would trip on transient pre-GC peaks that GC was about to reclaim.
12b2237 to
9de1fd3
Compare
gerlowskija
left a comment
There was a problem hiding this comment.
LGTM, and thanks for the thorough replies!
(Gentle reminder to avoid force-push in the future if possible. A force-push breaks GH's "Changes since your last review" distinction that I find a huge timesaver.)
Load-average and memory circuit breakers were doing more harm than good under high concurrency.
LoadAverageCircuitBreaker called OperatingSystemMXBean.getSystemLoadAverage() on every request. The OS load average is a one-minute moving average, so re-polling it per-request is wasted work - and when many requests arrived concurrently, the syscall stampede hammered the CPU, which is the condition this breaker exists to prevent rather than cause.
MemoryCircuitBreaker sampled MemoryMXBean.getHeapMemoryUsage().getUsed() on a 30-second moving average. With a generational collector that signal climbs toward max between collections during normal operation; the breaker would trip on transient pre-GC peaks that GC was about to reclaim.
https://issues.apache.org/jira/browse/SOLR-18284