feat(profiler): wall-clock signal suppression for idle threads#560
feat(profiler): wall-clock signal suppression for idle threads#560kaahos wants to merge 26 commits into
Conversation
|
CI Test ResultsRun: #27819097390 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-06-19 10:18:21 UTC |
3f4fe1e to
bcf405b
Compare
bcf405b to
b1cb73f
Compare
|
@copilot review this PR |
Reviewed the changes; no blocking issues found in the wall-clock precheck logic or tests. |
acb7295 to
8e3c14d
Compare
7c27a81 to
306aa49
Compare
306aa49 to
8bb1fed
Compare
|
[Sphinx Review — HIGH · CONSENSUS]
Suggestion: Either (a) add the same once-per-run precheck logic to
|
|
[Sphinx Review — MEDIUM]
Suggestion: Either strip |
|
[Sphinx Review — LOW · CONSENSUS] The comment justifying the CAS RMW states Suggestion: Replace with: "CAS RMW to update only TYPE_MASK bits without clobbering FLAG_PARKED, which may be written concurrently by |
|
@copilot review this PR. |
|
most of the previous comments have been addressed @jbachorik! The branch now also includes the new features coming from main. |
rkennke
left a comment
There was a problem hiding this comment.
This is a nice improvement! I have some questions...
0940fbe to
3df9af6
Compare
What does this PR do?:
Before sending
SIGVTALRM, the wall-clock timer checks whether the target thread is already in a skippable OS state and has already produced a sample during this blocking run. If so, the signal is suppressed. The first sample of every blocking run is always collected.Motivation:
High-frequency wall-clock sampling on threads spending most of their time idle generates many redundant samples of identical stacks at high CPU cost. The precheck lets us emit one representative sample per blocking run and skip the rest, recovering signal throughput for threads that are actually running.
This is the first of two PRs splitting
paul.fournillon/wallclock_precheck. This PR ships the suppression infrastructure. The follow-up PR addsdatadog.TaskBlockevent recording.Additional Notes:
How to test the change?:
New unit tests:
ddprof-lib/src/test/cpp/park_state_ut.cppddprof-lib/src/test/cpp/wallprecheck_args_ut.cppNew integration tests:
PrecheckTest: verifies suppression fires and reduces signal count.PrecheckEfficiencyTest: measures overhead reduction under sustained park.WallclockMitigationsCombinedTest: end-to-end combined mitigations scenario.Run locally with:
./gradlew :ddprof-test:test --tests '*Precheck*' --tests '*WallclockMitigations*' --tests '*WallclockTimer*'For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.Unsure? Have a question? Request a review!