Skip to content

opentelemetry-sdk: drop NaN measurements at instrument level to prevent aggregation poisoning#5336

Open
RudraDudhat2509 wants to merge 2 commits into
open-telemetry:mainfrom
RudraDudhat2509:fix/drop-nan-measurements
Open

opentelemetry-sdk: drop NaN measurements at instrument level to prevent aggregation poisoning#5336
RudraDudhat2509 wants to merge 2 commits into
open-telemetry:mainfrom
RudraDudhat2509:fix/drop-nan-measurements

Conversation

@RudraDudhat2509

@RudraDudhat2509 RudraDudhat2509 commented Jun 19, 2026

Copy link
Copy Markdown

Summary

  • NaN measurements passed to metric instruments (e.g. counter.add(float("nan"))) were silently folded into running aggregations, permanently poisoning the sum, all subsequent exports emitted nan with no recovery path
  • Root cause: the existing if amount < 0 guard on Counter and Histogram is bypassed by NaN because all IEEE 754 comparisons against NaN return False
  • Fix: add a math.isnan() check at the instrument level (same layer as the existing negative-value guard) for all sync instruments (Counter, UpDownCounter, Histogram, Gauge) and in the async callback path, matching the approach taken in the Java SDK (open-telemetry/opentelemetry-java#5846, fixed in PR #5859)

Closes #5330

What was tested

  • Added test_add_nan to TestCounter and TestUpDownCounter
  • Added test_record_nan to TestHistogram
  • Added test_set_nan to TestGauge
  • Added test_nan_callback_value_is_dropped to TestObservableGauge (verifies NaN from async callbacks is skipped, valid measurements still pass through)
  • All 30 instrument tests pass (python -m pytest opentelemetry-sdk/tests/metrics/test_instrument.py)
  • Verified end-to-end: counter.add(1) → add(NaN) → add(5) now yields sum = 6, not nan

@RudraDudhat2509 RudraDudhat2509 requested a review from a team as a code owner June 19, 2026 16:15
@RudraDudhat2509

Copy link
Copy Markdown
Author

here is a ss of the fix for the teams reference

image

Comment thread .changelog/5336.fixed
@@ -0,0 +1 @@
`opentelemetry-sdk`: drop NaN measurements at the instrument level to prevent permanent aggregation poisoning

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog fragment number should match the PR number

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recent commit addresses the same, thx for pointing it out !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

opentelemetry-sdk: NaN measurements permanently poison metric aggregations

2 participants