Skip to content

Hot/fix: Stack overflow crash when loading multiple translation units in parallel #74

Description

@SizzleUnrlsd

Summary

coretrace-stack-analyzer can crash when it analyzes a compile_commands.json
batch with more than one worker. The failure happens while the analyzer compiles
source files to LLVM IR through compilerlib::compile(...), which runs Clang
frontend actions in-process.

The observed crash is not caused by CoreTrace CLI parsing or by cross-TU summary
logic. It is triggered by parallel module loading/compilation inside the stack
analyzer.

Environment

  • Platform: macOS arm64
  • LLVM/Clang: Homebrew LLVM 20.1.2
  • Binary: ctrace, linked against libclang-cpp.dylib and libLLVM.dylib
  • Shell stack limit: 8176 KB
  • Hardware concurrency observed by the analyzer: 8

Reproduction

From the parent CoreTrace checkout that embeds coretrace-stack-analyzer:

./build/ctrace \
  --compile-commands=./build/compile_commands.json \
  --invoke ctrace_stack_analyzer \
  --config config/tool-config.json

With stack_analyzer.jobs unset/empty, the analyzer resolves to jobs=auto and
starts multiple workers.

The crash also reproduces with cross-TU disabled:

{
  "stack_analyzer": {
    "jobs": "2",
    "resource_cross_tu": false,
    "uninitialized_cross_tu": false
  }
}

jobs=2 is enough to reproduce. jobs=1 completes successfully.

Actual behavior

The process exits with a native crash shortly after:

== CoreTrace == [INFO] Running specific tools on 16 file(s)
== CoreTrace == [INFO] Running CoreTrace Stack Analyzer on 16 files
bus error

or:

illegal hardware instruction

Under lldb, the actual stop reason is an EXC_BAD_ACCESS in Clang Sema:

* thread #4, stop reason = EXC_BAD_ACCESS (code=2, address=0x16ff1bb58)
frame #0: libclang-cpp.dylib`CheckConvertibilityForTypeTraits(...) + 136

The failing instruction writes to the current stack:

libclang-cpp.dylib`CheckConvertibilityForTypeTraits:
-> stp x21, x24, [sp, #0x28]

Registers at the crash:

sp = 0x000000016ff1bb30
pc = libclang-cpp.dylib`CheckConvertibilityForTypeTraits(...) + 136

The faulting address is sp + 0x28, and the stack pointer is in an inaccessible
region:

memory region $sp
[0x000000016ff18000-0x000000016ff1c000) ---

This points to a worker thread stack overflow while Clang is deeply instantiating
C++ templates.

Expected behavior

The analyzer should either:

  • complete analysis successfully, or
  • report a per-translation-unit compilation/loading failure without crashing the
    hosting process.

ctrace should not be terminated by a native crash from an embedded analyzer
worker.

Relevant code path

The CoreTrace bridge invokes the analyzer in-process:

ctrace::stack::app::runAnalyzerApp(std::move(parseResult.parsed));

The analyzer schedules module loading in worker threads:

runParallelWork(inputFilenames.size(), loadJobs,
                [&](std::size_t index) { loadSingleModule(index); });

Each worker calls:

analysis::loadModuleForAnalysis(inputFilename, cfg, *moduleContext, localErr);

The input pipeline compiles non-IR inputs through compilerlib:

return compilerlib::compile(compileArgs, outputMode);

compilerlib executes Clang frontend actions in-process, including:

clang::EmitBCAction
clang::EmitLLVMAction
clang::EmitLLVMOnlyAction

Verification results

The following matrix was observed:

Configuration Result
jobs=auto, cross-TU enabled crashes
jobs=auto, cross-TU disabled crashes
jobs=2, cross-TU disabled crashes
jobs=1, cross-TU enabled exits 0
jobs=1, cross-TU disabled exits 0

This isolates the failure to parallel in-process Clang compilation/loading, not
to cross-TU resource or uninitialized summary construction.

Workaround

Set:

{
  "stack_analyzer": {
    "jobs": "1"
  }
}

This serializes module loading/compilation and avoids the stack overflow in the
observed environment.

Proposed fix direction

Avoid treating the same jobs setting as safe for in-process Clang compilation.
The current architecture is fast, but it gives Clang frontend crashes the same
blast radius as the analyzer process.

Recommended direction:

  1. Introduce a dedicated module loading / compile execution policy.
  2. Serialize source-to-IR compilation when using in-process compilerlib.
  3. Keep parallelism for analysis phases that operate on already-loaded modules.
  4. Prefer subprocess isolation for Clang compilation as the robust long-term
    path. If a subprocess crashes, the analyzer can report a failed TU instead of
    crashing the parent process.

Increasing worker thread stack size via platform-specific thread attributes can
reduce this specific crash, but it is less robust than isolating Clang
compilation or serializing the in-process frontend. The issue is generic: any
template-heavy translation unit can exceed a worker stack when compiled
in-process.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Fields

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions