Fix PooledHashMap dropping live entries when an entry is removed during forEach#8499
Conversation
|
|
9a1d1bb to
dae39c2
Compare
…ved during iteration
dae39c2 to
2d49da6
Compare
jack-berg
left a comment
There was a problem hiding this comment.
Nice catch. Build is failing because of formatting. Please run ./gradlew spotlessApply, and commit / push the changes.
| // 1 and 17 land in the same bucket, like collection removing a stale series mid-iteration. | ||
| PooledHashMap<Integer, Integer> collidingMap = new PooledHashMap<>(); | ||
| collidingMap.put(1, 1); | ||
| collidingMap.put(17, 17); |
There was a problem hiding this comment.
This test relies on the default capacity changing. Can you either add a comment indicating that if the default capacity changes this test will have to change, or use a different key type that uses a constant hashcode such that keys are guaranteed to be in the same bucket?
There was a problem hiding this comment.
Done, thanks. Used a constant-hashcode key so the collision doesn't depend on capacity anymore
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8499 +/- ##
============================================
- Coverage 78.77% 78.50% -0.27%
- Complexity 8579 8601 +22
============================================
Files 1009 1013 +4
Lines 28993 29150 +157
Branches 3599 3632 +33
============================================
+ Hits 22839 22885 +46
- Misses 5311 5421 +110
- Partials 843 844 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@jack-berg Now all jobs are succeeding and tests too, I believe it's ready now :) |
|
Thank you for your contribution @abdessattar23! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. |
Noticed
PooledHashMap.forEachcan skip an entry if the callback removes the one it's currently on. It walks each bucket by index, soremove()shifts the next entry into the current slot andi++jumps over itThis bites
AsynchronousMetricStoragein REUSABLE_DATA mode: during collection it removes stale series while iterating, so a live series sharing a bucket with a stale one can get dropped from that cycle. For cumulative async that shows up as a gap then a jump in the output. (IMMUTABLE_DATA usesConcurrentHashMapand was fine. The code comment even says the map was picked to allow removal during iteration, it just wasn't holding up)Fix: iterate each bucket back-to-front so removing the current entry only shifts already-visited ones. Added tests at the map level (collision plus remove-while-iterating) and the storage level (live series survive when stale ones drop)