What happened
I hit a local out-of-memory while running cognee and traced it to the pipeline_runs table in the SQLite relational store. It had grown to 6.4 GB across ~23k rows (averaging ~280 KB each). Everything else in the database (nodes, edges) was under 100 MB combined, so this single table was essentially the entire file.
Cause
log_pipeline_run_start, log_pipeline_run_complete and log_pipeline_run_error all build run_info the same way:
if not data:
data_info = "None"
elif isinstance(data, list) and all(isinstance(item, Data) for item in data):
data_info = [str(item.id) for item in data]
else:
data_info = str(data) # full payload stored verbatim
When data is not a list of Data records (e.g. raw text passed to add/remember), the entire payload gets stringified into run_info["data"] on every run. There is no size cap and no pruning, so the table grows without bound.
As far as I can tell the column is never read back from the database anywhere in the codebase. The places that look like readers operate on the in-memory PipelineRunInfo object returned from cognify(), and only touch .status / .pipeline_run_id. So run_info["data"] is write-only audit data.
Impact
Opening the store pulls this large table into process memory, which is what caused the OOM on a memory-constrained machine.
This is the same failure mode that #2549 fixed for the queries / results search-history tables. pipeline_runs was not covered by that change, and the code is unchanged on dev.
Suggested fix
Cap the stringified payload (truncate to a few hundred chars with a marker recording the original length), leaving the Data-list and empty-input branches as they are. I have a PR ready that does this behind a shared helper plus a unit test.
Environment
cognee 1.1.2 (also confirmed present on dev), SQLite relational backend.
What happened
I hit a local out-of-memory while running cognee and traced it to the
pipeline_runstable in the SQLite relational store. It had grown to 6.4 GB across ~23k rows (averaging ~280 KB each). Everything else in the database (nodes, edges) was under 100 MB combined, so this single table was essentially the entire file.Cause
log_pipeline_run_start,log_pipeline_run_completeandlog_pipeline_run_errorall buildrun_infothe same way:When
datais not a list ofDatarecords (e.g. raw text passed toadd/remember), the entire payload gets stringified intorun_info["data"]on every run. There is no size cap and no pruning, so the table grows without bound.As far as I can tell the column is never read back from the database anywhere in the codebase. The places that look like readers operate on the in-memory
PipelineRunInfoobject returned fromcognify(), and only touch.status/.pipeline_run_id. Sorun_info["data"]is write-only audit data.Impact
Opening the store pulls this large table into process memory, which is what caused the OOM on a memory-constrained machine.
This is the same failure mode that #2549 fixed for the
queries/resultssearch-history tables.pipeline_runswas not covered by that change, and the code is unchanged ondev.Suggested fix
Cap the stringified payload (truncate to a few hundred chars with a marker recording the original length), leaving the
Data-list and empty-input branches as they are. I have a PR ready that does this behind a shared helper plus a unit test.Environment
cognee 1.1.2 (also confirmed present on
dev), SQLite relational backend.