Skip to content

fix(llm/ollama): offload blocking sync client calls off the event loop#3069

Open
ly-wang19 wants to merge 1 commit into
topoteretes:mainfrom
ly-wang19:fix/ollama-adapter-blocking-event-loop
Open

fix(llm/ollama): offload blocking sync client calls off the event loop#3069
ly-wang19 wants to merge 1 commit into
topoteretes:mainfrom
ly-wang19:fix/ollama-adapter-blocking-event-loop

Conversation

@ly-wang19

Copy link
Copy Markdown
Contributor

What

OllamaAPIAdapter wraps a synchronous OpenAI client (instructor.from_openai(OpenAI(...))), but its async def methods call .create() directly — so each LLM round-trip blocks the event loop, serializing any concurrent async callers behind it.

Every other adapter is already non-blocking: generic_llm_api / openai use instructor.from_litellm(litellm.acompletion) with await, and the sibling llama_cpp adapter uses AsyncOpenAI. Ollama was the lone synchronous outlier.

Change

Offload the three blocking client calls — acreate_structured_output, create_transcript, transcribe_image — via asyncio.to_thread (a pattern already used across cognee, e.g. tasks/ingestion, shared/cache). This keeps the existing synchronous Ollama client but moves the blocking call off the event loop; no other behavior change.

Test

Adds cognee/tests/unit/infrastructure/llm/test_ollama_adapter.py, which spies on asyncio.to_thread and asserts acreate_structured_output dispatches the (synchronous) client call through it. Reverting the offload fails the test.

pytest cognee/tests/unit/infrastructure/llm/test_ollama_adapter.py1 passed.

(Happy to switch to AsyncOpenAI + await to mirror the llama_cpp adapter instead, if you prefer the true-async client over the thread offload.)

OllamaAPIAdapter wraps a synchronous OpenAI client
(instructor.from_openai(OpenAI(...))), but its async methods called .create()
directly, blocking the event loop for the full LLM round-trip and serializing
concurrent callers. The generic/openai adapters use an async client
(litellm.acompletion) and the llama_cpp adapter uses AsyncOpenAI -- this adapter
was the sync outlier.

Offload the three blocking client calls (acreate_structured_output,
create_transcript, transcribe_image) via asyncio.to_thread, matching the pattern
used elsewhere in cognee. Adds a regression test asserting the structured-output
call is dispatched through asyncio.to_thread.

Signed-off-by: ly-wang19 <ly-wang19@users.noreply.github.com>
@ly-wang19 ly-wang19 requested a review from Vasilije1990 as a code owner June 14, 2026 14:17
@github-actions

Copy link
Copy Markdown
Contributor

Hello @ly-wang19, thank you for submitting a PR! We will respond as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant