Skip to content

pipeline_runs table grows without bound: run_info stores full input payload on every run #3074

@P7AC1D

Description

@P7AC1D

What happened

I hit a local out-of-memory while running cognee and traced it to the pipeline_runs table in the SQLite relational store. It had grown to 6.4 GB across ~23k rows (averaging ~280 KB each). Everything else in the database (nodes, edges) was under 100 MB combined, so this single table was essentially the entire file.

Cause

log_pipeline_run_start, log_pipeline_run_complete and log_pipeline_run_error all build run_info the same way:

if not data:
    data_info = "None"
elif isinstance(data, list) and all(isinstance(item, Data) for item in data):
    data_info = [str(item.id) for item in data]
else:
    data_info = str(data)   # full payload stored verbatim

When data is not a list of Data records (e.g. raw text passed to add/remember), the entire payload gets stringified into run_info["data"] on every run. There is no size cap and no pruning, so the table grows without bound.

As far as I can tell the column is never read back from the database anywhere in the codebase. The places that look like readers operate on the in-memory PipelineRunInfo object returned from cognify(), and only touch .status / .pipeline_run_id. So run_info["data"] is write-only audit data.

Impact

Opening the store pulls this large table into process memory, which is what caused the OOM on a memory-constrained machine.

This is the same failure mode that #2549 fixed for the queries / results search-history tables. pipeline_runs was not covered by that change, and the code is unchanged on dev.

Suggested fix

Cap the stringified payload (truncate to a few hundred chars with a marker recording the original length), leaving the Data-list and empty-input branches as they are. I have a PR ready that does this behind a shared helper plus a unit test.

Environment

cognee 1.1.2 (also confirmed present on dev), SQLite relational backend.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions