Skip to content

feat(profiler): wall-clock signal suppression for idle threads#560

Open
kaahos wants to merge 26 commits into
mainfrom
paul.fournillon/wallclock-suppression
Open

feat(profiler): wall-clock signal suppression for idle threads#560
kaahos wants to merge 26 commits into
mainfrom
paul.fournillon/wallclock-suppression

Conversation

@kaahos

@kaahos kaahos commented May 29, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?:

Before sending SIGVTALRM, the wall-clock timer checks whether the target thread is already in a skippable OS state and has already produced a sample during this blocking run. If so, the signal is suppressed. The first sample of every blocking run is always collected.

Motivation:

High-frequency wall-clock sampling on threads spending most of their time idle generates many redundant samples of identical stacks at high CPU cost. The precheck lets us emit one representative sample per blocking run and skip the rest, recovering signal throughput for threads that are actually running.

This is the first of two PRs splitting paul.fournillon/wallclock_precheck. This PR ships the suppression infrastructure. The follow-up PR adds datadog.TaskBlock event recording.

Additional Notes:

How to test the change?:

New unit tests:

  • ddprof-lib/src/test/cpp/park_state_ut.cpp
  • ddprof-lib/src/test/cpp/wallprecheck_args_ut.cpp

New integration tests:

  • PrecheckTest: verifies suppression fires and reduces signal count.
  • PrecheckEfficiencyTest: measures overhead reduction under sustained park.
  • WallclockMitigationsCombinedTest: end-to-end combined mitigations scenario.

Run locally with:
./gradlew :ddprof-test:test --tests '*Precheck*' --tests '*WallclockMitigations*' --tests '*WallclockTimer*'

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: [JIRA-14354]

Unsure? Have a question? Request a review!

@datadog-prod-us1-5

datadog-prod-us1-5 Bot commented May 29, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 14 Pipeline jobs failed

DataDog/java-profiler | benchmarks-candidate-aarch64: [alloc]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall,alloc,memleak]   View in Datadog   GitLab

DataDog/java-profiler | benchmarks-candidate-aarch64: [cpu,wall]   View in Datadog   GitLab

View all 14 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 893f2fa | Docs | Datadog PR Page | Give us feedback!

@kaahos kaahos changed the title Paul.fournillon/wallclock suppression feat(profiler): wall-clock signal suppression for sleeping/parked threads May 29, 2026
@dd-octo-sts

dd-octo-sts Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

Run: #27819097390 | Commit: 53ae346 | Duration: 13m 17s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-06-19 10:18:21 UTC

@kaahos kaahos force-pushed the paul.fournillon/wallclock-suppression branch 5 times, most recently from 3f4fe1e to bcf405b Compare May 29, 2026 13:22
@kaahos kaahos force-pushed the paul.fournillon/wallclock-suppression branch from bcf405b to b1cb73f Compare May 29, 2026 13:39
Copilot finished work on behalf of kaahos May 29, 2026 15:57
@kaahos

kaahos commented May 29, 2026

Copy link
Copy Markdown
Contributor Author

@copilot review this PR

Copilot AI commented May 29, 2026

Copy link
Copy Markdown
Contributor

@copilot review this PR

Reviewed the changes; no blocking issues found in the wall-clock precheck logic or tests.

Copilot finished work on behalf of kaahos May 29, 2026 16:19
@kaahos kaahos marked this pull request as ready for review June 1, 2026 13:49
@kaahos kaahos requested a review from a team as a code owner June 1, 2026 13:49
@kaahos kaahos changed the title feat(profiler): wall-clock signal suppression for sleeping/parked threads feat(profiler): wall-clock signal suppression for idle threads Jun 1, 2026
@kaahos kaahos force-pushed the paul.fournillon/wallclock-suppression branch from acb7295 to 8e3c14d Compare June 3, 2026 14:04
@kaahos kaahos force-pushed the paul.fournillon/wallclock-suppression branch 2 times, most recently from 7c27a81 to 306aa49 Compare June 8, 2026 23:23
@kaahos kaahos force-pushed the paul.fournillon/wallclock-suppression branch from 306aa49 to 8bb1fed Compare June 8, 2026 23:51
Comment thread ddprof-lib/src/main/cpp/javaApi.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/javaApi.cpp
Comment thread ddprof-lib/src/main/cpp/threadFilter.cpp
Comment thread ddprof-lib/src/main/cpp/wallClock.cpp Outdated
Comment thread ddprof-lib/src/main/cpp/threadFilter.h
Comment thread ddprof-lib/src/main/cpp/wallClockCounters.h Outdated
Comment thread ddprof-lib/src/main/cpp/wallClockCounters.h
@jbachorik

Copy link
Copy Markdown
Collaborator

[Sphinx Review — HIGH · CONSENSUS] wallClock.cpp · WallClockJvmti::timerLoop · completeness

WallClockJvmti::timerLoop() and signalHandler() have no wallprecheck logic. WallClockJvmti::initialize() never reads args._wall_precheck and the class has no _precheck member. When jvmtistacks=true is combined with wallprecheck=true (valid on HotSpot JDK 11+ where canRequestStackTrace() returns true), the precheck suppression feature is silently a no-op — no timer-thread fast-path, no signal-handler suppression, no counter increments.

Suggestion: Either (a) add the same once-per-run precheck logic to WallClockJvmti::timerLoop/signalHandler mirroring the WallClockASGCT path (switching to timerLoopCommon<ThreadEntry> with collect(entries)), or (b) document the incompatibility and emit a warning/error in Profiler::start() when both jvmtistacks and wallprecheck are enabled.

Confirmed by adversary.

@jbachorik

Copy link
Copy Markdown
Collaborator

[Sphinx Review — MEDIUM] thread.h · ProfiledThread park infrastructure · necessity

_park_start_ticks and _park_context are captured on every LockSupport.park call but parkExit0 immediately discards both output values. FLAG_PARKED is set/cleared but never consulted in any production path. setJavaThread was made more expensive (CAS loop) to preserve FLAG_PARKED that nothing reads. The PR description says TaskBlock event recording is in PR2.

Suggestion: Either strip _park_start_ticks, _park_context, and FLAG_PARKED from ProfiledThread until they are consumed in PR2, or add a Jira ticket reference linking this scaffolding to the follow-on work. If kept, revert setJavaThread to the simpler non-CAS form until there is a real reader for FLAG_PARKED.

@jbachorik

Copy link
Copy Markdown
Collaborator

[Sphinx Review — LOW · CONSENSUS] thread.h · setJavaThread comment · consistency

The comment justifying the CAS RMW states parkEnter/parkExit run "in signal-handler context" but they are JNI calls from Java application threads, not signal handlers. The concurrent access that justifies the CAS is between the owner thread calling parkEnter/parkExit (JNI) and a different thread calling setJavaThread (JVMTI onThreadStart or vmStructs).

Suggestion: Replace with: "CAS RMW to update only TYPE_MASK bits without clobbering FLAG_PARKED, which may be written concurrently by parkEnter/parkExit JNI calls on the owner thread while a JVMTI callback modifies thread type on a different thread."

@kaahos

kaahos commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@copilot review this PR.

@kaahos

kaahos commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

most of the previous comments have been addressed @jbachorik! The branch now also includes the new features coming from main.

@rkennke rkennke left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement! I have some questions...

Comment thread ddprof-lib/src/main/cpp/threadFilter.h Outdated
Comment thread ddprof-lib/src/main/cpp/thread.h Outdated
Comment thread ddprof-lib/src/main/cpp/livenessTracker.cpp
Comment thread ddprof-lib/src/main/cpp/hotspot/vmStructs.inline.h Outdated
Comment thread ddprof-lib/src/main/cpp/hotspot/vmStructs.inline.h Outdated
@kaahos kaahos force-pushed the paul.fournillon/wallclock-suppression branch from 0940fbe to 3df9af6 Compare June 19, 2026 09:22
@kaahos kaahos requested a review from rkennke June 19, 2026 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants