Skip to content

Add mremap and shmat/shmdt instrumentation#555

Closed
r1viollet wants to merge 9 commits into
mainfrom
r1viollet/add-mremap-instrumentation
Closed

Add mremap and shmat/shmdt instrumentation#555
r1viollet wants to merge 9 commits into
mainfrom
r1viollet/add-mremap-instrumentation

Conversation

@r1viollet

@r1viollet r1viollet commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

What

Adds allocation tracking hooks for mremap() and System V shared memory (shmat()/shmdt()).

Why

These APIs can allocate or deallocate memory without going through malloc/mmap, leading to under-reporting in allocation profiles. Notable examples:

  • mremap() is used to grow large allocations (e.g., df-executor mmaps)
  • shmat()/shmdt() are used by databases (PostgreSQL, Oracle) for System V shared memory

Changes

Instrumentation added (src/lib/symbol_overrides.cc):

  • mremap() - tracks old region as dealloc + new region as alloc
  • shmat() - queries segment size via shmctl(IPC_STAT) and tracks as alloc
  • shmdt() - tracks as dealloc

Not instrumented (with comment explaining why):

  • brk()/sbrk() - manipulate the program break, not individual allocations. Would double-count if malloc uses them internally, and have wrong semantics (heap boundary vs allocated objects).

Tests added (test/allocation_tracker-ut.cc):

  • mremap test via test_realloc pattern
  • shmat/shmdt test with IPC_PRIVATE segment

All tests gated by weak symbol checks.

Verification

Unit tests pass for the new hooks. Full CI validation pending.

r1viollet added 9 commits May 26, 2026 12:32
Workers are restarted by forking a fresh process from the parent, which
loses everything in DDProfWorkerContext — including the heap-tracking
aggregator in LiveAllocation. Until natural alloc/free traffic refills
the map, live-heap is undercounted for the rest of the target's life.

Add a serialisation path that survives the fork:

- main_loop allocates a memfd that the parent keeps open and every
  worker child inherits.
- On 'restart_worker', the outgoing child resolves its UnwindOutput
  handles back to portable strings (via libdatadog Function2/Mapping2
  read-back) and writes a self-owned snapshot to the memfd.
- The new child reads the snapshot in worker_library_init, re-interns
  mappings/functions into its fresh ProfilesDictionary and rebuilds the
  LiveAllocation maps before the poll loop starts draining events.
- LiveAllocation owns a string deque backing the string_views of
  restored UnwindOutputs; live entries built from incoming events keep
  using Process/base-frame views.

Budget enforcement, value-preserving:

- Default target 4 MB, hard ceiling 20 MB.
- When over budget, rank stacks by aggregate value and drop the lowest;
  their addresses are remapped to a synthetic [live-alloc cleared]
  common frame so per-PID heap totals remain correct.
- If still over after dropping all stacks, drop entire PIDs from the
  lowest aggregate value upwards.

In-flight events between the old child exit and the new child's first
poll are still lost; a library-side pause hook is a separate change.
Add a third live-heap variant to simple_malloc-ut.sh that drives the
worker into at least one reset (upload_period=2s, worker_period=2) with
--skip-free 100 keeping ~99% of allocations live, and checks:

  - at least one '[live-alloc] Snapshot restored' log line
  - zero 'Tracked address count mismatch' warnings between the profiler
    and the in-target library after restore

Adds ~7s to the simple_malloc suite (target needs to outlive 2 export
cycles). Same test runs under DD_PROFILING_REORDER_EVENTS=1 too.
clang-tidy errors flagged by CI:
- readability-math-missing-parentheses on sizeof(T) * N + ... arithmetic
- cppcoreguidelines-avoid-const-or-ref-data-members on Writer::_out
  (switch the reference member to a non-owning pointer)
- readability-uppercase-literal-suffix (0u -> 0U)
- misc-const-correctness on loop indices (uint32_t idx -> uint32_t const idx)

Also adds a TODO block above portable_to_uo() spelling out the four
overlapping caches (ProfilesDictionary, SymbolTable/MapInfoTable,
RuntimeSymbolLookup et al., _restored_strings), the duplicate-entry
cost we accept on the restore path, and how a future PR can unify the
model by making FunLoc identity content-based on libdatadog handles.
- DD_PROFILING_NATIVE_LIVE_ALLOC_SNAPSHOT_MAX_BYTES overrides the
  per-capture budget. Capped at the hard ceiling. Lets tests force
  the cleared-stack remap path and the dropped-pid fallback without
  rebuilding the binary.

- simple_malloc --unique-sites N spreads allocations across up to 256
  templated alloc_at_site<Tag> instantiations, each producing a
  distinct innermost frame to the unwinder. Used to stress-test the
  snapshot path with many unique stacks per cycle.

Verified locally at three budget levels: full preservation, cleared
remap (stacks=30 cleared=582 dropped_pids=0 at 240 KB), and pid drop
(dropped_pids=1 at 8 KB). All paths keep 'Tracked address count
mismatch' warnings at zero in the steady state.
Extends allocation tracking to cover additional memory allocation APIs
that were previously missing:

1. **mremap()** - Remap/resize existing mmap regions
   - Tracks old region as deallocation + new region as allocation
   - Commonly used by allocators to grow large allocations

2. **shmat()/shmdt()** - System V shared memory attach/detach
   - Queries segment size via shmctl(IPC_STAT) on attach
   - Commonly used by databases (PostgreSQL, Oracle) and legacy IPC

3. **sbrk()** - Increment program break
   - Tracks positive increments as allocations
   - Used by some allocators and legacy code

4. **brk()** - Set absolute program break
   - No tracking (requires maintaining state)
   - Rarely called directly

These APIs can bypass malloc/mmap hooks when called directly or via
syscalls, leading to under-reporting of memory usage in allocation
profiles.
Verifies that the new allocation tracking hooks work correctly:
- mremap: tests realloc-style semantics (dealloc old + alloc new)
- shmat/shmdt: tests System V shared memory attach/detach
- sbrk: tests heap expansion (positive increment only)
brk() and sbrk() manipulate the program break (heap boundary), not
individual allocations. Tracking them would:

1. Double-count: if malloc uses brk/sbrk internally, we'd track both
   the heap growth AND the malloc allocations from that heap
2. Have wrong semantics: sbrk(1MB) means "1MB available" not "1MB allocated"
3. Require state: need to track what's actually used vs just available

Since malloc hooks already catch allocations made from the heap,
instrumenting brk/sbrk would only add noise and confusion.
@r1viollet r1viollet changed the title Add mremap, shmat/shmdt, and sbrk/brk instrumentation Add mremap and shmat/shmdt instrumentation Jun 23, 2026
@r1viollet

Copy link
Copy Markdown
Collaborator Author

Replaced by #556 (clean single commit off main)

@r1viollet r1viollet closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant