Add mremap and shmat/shmdt instrumentation#555
Closed
r1viollet wants to merge 9 commits into
Closed
Conversation
Workers are restarted by forking a fresh process from the parent, which loses everything in DDProfWorkerContext — including the heap-tracking aggregator in LiveAllocation. Until natural alloc/free traffic refills the map, live-heap is undercounted for the rest of the target's life. Add a serialisation path that survives the fork: - main_loop allocates a memfd that the parent keeps open and every worker child inherits. - On 'restart_worker', the outgoing child resolves its UnwindOutput handles back to portable strings (via libdatadog Function2/Mapping2 read-back) and writes a self-owned snapshot to the memfd. - The new child reads the snapshot in worker_library_init, re-interns mappings/functions into its fresh ProfilesDictionary and rebuilds the LiveAllocation maps before the poll loop starts draining events. - LiveAllocation owns a string deque backing the string_views of restored UnwindOutputs; live entries built from incoming events keep using Process/base-frame views. Budget enforcement, value-preserving: - Default target 4 MB, hard ceiling 20 MB. - When over budget, rank stacks by aggregate value and drop the lowest; their addresses are remapped to a synthetic [live-alloc cleared] common frame so per-PID heap totals remain correct. - If still over after dropping all stacks, drop entire PIDs from the lowest aggregate value upwards. In-flight events between the old child exit and the new child's first poll are still lost; a library-side pause hook is a separate change.
Add a third live-heap variant to simple_malloc-ut.sh that drives the
worker into at least one reset (upload_period=2s, worker_period=2) with
--skip-free 100 keeping ~99% of allocations live, and checks:
- at least one '[live-alloc] Snapshot restored' log line
- zero 'Tracked address count mismatch' warnings between the profiler
and the in-target library after restore
Adds ~7s to the simple_malloc suite (target needs to outlive 2 export
cycles). Same test runs under DD_PROFILING_REORDER_EVENTS=1 too.
clang-tidy errors flagged by CI: - readability-math-missing-parentheses on sizeof(T) * N + ... arithmetic - cppcoreguidelines-avoid-const-or-ref-data-members on Writer::_out (switch the reference member to a non-owning pointer) - readability-uppercase-literal-suffix (0u -> 0U) - misc-const-correctness on loop indices (uint32_t idx -> uint32_t const idx) Also adds a TODO block above portable_to_uo() spelling out the four overlapping caches (ProfilesDictionary, SymbolTable/MapInfoTable, RuntimeSymbolLookup et al., _restored_strings), the duplicate-entry cost we accept on the restore path, and how a future PR can unify the model by making FunLoc identity content-based on libdatadog handles.
- DD_PROFILING_NATIVE_LIVE_ALLOC_SNAPSHOT_MAX_BYTES overrides the per-capture budget. Capped at the hard ceiling. Lets tests force the cleared-stack remap path and the dropped-pid fallback without rebuilding the binary. - simple_malloc --unique-sites N spreads allocations across up to 256 templated alloc_at_site<Tag> instantiations, each producing a distinct innermost frame to the unwinder. Used to stress-test the snapshot path with many unique stacks per cycle. Verified locally at three budget levels: full preservation, cleared remap (stacks=30 cleared=582 dropped_pids=0 at 240 KB), and pid drop (dropped_pids=1 at 8 KB). All paths keep 'Tracked address count mismatch' warnings at zero in the steady state.
Extends allocation tracking to cover additional memory allocation APIs that were previously missing: 1. **mremap()** - Remap/resize existing mmap regions - Tracks old region as deallocation + new region as allocation - Commonly used by allocators to grow large allocations 2. **shmat()/shmdt()** - System V shared memory attach/detach - Queries segment size via shmctl(IPC_STAT) on attach - Commonly used by databases (PostgreSQL, Oracle) and legacy IPC 3. **sbrk()** - Increment program break - Tracks positive increments as allocations - Used by some allocators and legacy code 4. **brk()** - Set absolute program break - No tracking (requires maintaining state) - Rarely called directly These APIs can bypass malloc/mmap hooks when called directly or via syscalls, leading to under-reporting of memory usage in allocation profiles.
Verifies that the new allocation tracking hooks work correctly: - mremap: tests realloc-style semantics (dealloc old + alloc new) - shmat/shmdt: tests System V shared memory attach/detach - sbrk: tests heap expansion (positive increment only)
brk() and sbrk() manipulate the program break (heap boundary), not individual allocations. Tracking them would: 1. Double-count: if malloc uses brk/sbrk internally, we'd track both the heap growth AND the malloc allocations from that heap 2. Have wrong semantics: sbrk(1MB) means "1MB available" not "1MB allocated" 3. Require state: need to track what's actually used vs just available Since malloc hooks already catch allocations made from the heap, instrumenting brk/sbrk would only add noise and confusion.
Collaborator
Author
|
Replaced by #556 (clean single commit off main) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds allocation tracking hooks for
mremap()and System V shared memory (shmat()/shmdt()).Why
These APIs can allocate or deallocate memory without going through malloc/mmap, leading to under-reporting in allocation profiles. Notable examples:
mremap()is used to grow large allocations (e.g., df-executor mmaps)shmat()/shmdt()are used by databases (PostgreSQL, Oracle) for System V shared memoryChanges
Instrumentation added (
src/lib/symbol_overrides.cc):mremap()- tracks old region as dealloc + new region as allocshmat()- queries segment size viashmctl(IPC_STAT)and tracks as allocshmdt()- tracks as deallocNot instrumented (with comment explaining why):
brk()/sbrk()- manipulate the program break, not individual allocations. Would double-count if malloc uses them internally, and have wrong semantics (heap boundary vs allocated objects).Tests added (
test/allocation_tracker-ut.cc):mremaptest viatest_reallocpatternshmat/shmdttest with IPC_PRIVATE segmentAll tests gated by weak symbol checks.
Verification
Unit tests pass for the new hooks. Full CI validation pending.