fix(pipelines): bound run_info data to prevent unbounded pipeline_runs growth by P7AC1D · Pull Request #3075 · topoteretes/cognee

P7AC1D · 2026-06-15T08:32:34Z

Description

While debugging a local out-of-memory I traced it to the pipeline_runs table in the SQLite store growing to several GB (~23k rows, ~280 KB each). Every pipeline run logs a run_info row, and log_pipeline_run_start / complete / error store the input payload via the str(data) fallback whenever data is not a list of Data records. In my case raw text passed to add / remember was being stored verbatim on every run.

That column is never read back from the database anywhere (the apparent readers operate on the in-memory PipelineRunInfo returned by cognify() and only use .status / .pipeline_run_id), so it is write-only audit data that grows without limit, and opening the store pulls it into process memory.

This is the same failure mode #2549 fixed for the queries / results search-history tables; pipeline_runs was not covered there.

The fix extracts the shared branch into summarize_run_info_data() and caps the stringified payload at 512 chars with a truncation marker that records the original length. Empty input still maps to "None", and lists of Data records still reduce to their ids, so existing behaviour is unchanged for those cases.

Closes #3074.

Acceptance Criteria

run_info["data"] no longer stores unbounded payloads: large inputs are truncated to 512 chars followed by a [truncated, N chars total] marker.
Existing behaviour preserved for empty input ("None") and lists of Data (list of ids).
New unit test covers empty / Data-list / small / large payloads.

Local test run:

cognee/tests/unit/modules/pipelines/test_summarize_run_info_data.py ....   [4 passed]

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Code refactoring
Other (please specify):

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR (See CONTRIBUTING.md)
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

…s growth log_pipeline_run_start/complete/error stored the full stringified input payload in run_info["data"] on every run via the str(data) fallback. This column is never read back from the database, so for large inputs (e.g. raw text passed to add/cognify) the pipeline_runs table grows without bound. Extract the shared summarisation into summarize_run_info_data() and cap the stringified payload at 512 chars with a truncation marker, mirroring the intent of topoteretes#2549 for the search-history tables. Empty input and lists of Data records are unchanged.

Vasilije1990 · 2026-06-15T14:05:15Z

@P7AC1D does it make sense to truncate? This is useful for async runs when you want to get a status of the pipeline and understand how and where things are in the flow

We'll check internally and see to simplify, but in general, running sqlite is not recommended for prod workloads

P7AC1D requested a review from Vasilije1990 as a code owner June 15, 2026 08:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pipelines): bound run_info data to prevent unbounded pipeline_runs growth#3075

fix(pipelines): bound run_info data to prevent unbounded pipeline_runs growth#3075
P7AC1D wants to merge 1 commit into
topoteretes:devfrom
P7AC1D:fix/bound-pipeline-run-info-data

P7AC1D commented Jun 15, 2026

Uh oh!

Vasilije1990 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

P7AC1D commented Jun 15, 2026

Description

Acceptance Criteria

Type of Change

Pre-submission Checklist

DCO Affirmation

Uh oh!

Vasilije1990 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants