fix(llm/ollama): offload blocking sync client calls off the event loop#3069
Open
ly-wang19 wants to merge 1 commit into
Open
fix(llm/ollama): offload blocking sync client calls off the event loop#3069ly-wang19 wants to merge 1 commit into
ly-wang19 wants to merge 1 commit into
Conversation
OllamaAPIAdapter wraps a synchronous OpenAI client (instructor.from_openai(OpenAI(...))), but its async methods called .create() directly, blocking the event loop for the full LLM round-trip and serializing concurrent callers. The generic/openai adapters use an async client (litellm.acompletion) and the llama_cpp adapter uses AsyncOpenAI -- this adapter was the sync outlier. Offload the three blocking client calls (acreate_structured_output, create_transcript, transcribe_image) via asyncio.to_thread, matching the pattern used elsewhere in cognee. Adds a regression test asserting the structured-output call is dispatched through asyncio.to_thread. Signed-off-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Contributor
|
Hello @ly-wang19, thank you for submitting a PR! We will respond as soon as possible. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
OllamaAPIAdapterwraps a synchronousOpenAIclient (instructor.from_openai(OpenAI(...))), but itsasync defmethods call.create()directly — so each LLM round-trip blocks the event loop, serializing any concurrent async callers behind it.Every other adapter is already non-blocking:
generic_llm_api/openaiuseinstructor.from_litellm(litellm.acompletion)withawait, and the siblingllama_cppadapter usesAsyncOpenAI. Ollama was the lone synchronous outlier.Change
Offload the three blocking client calls —
acreate_structured_output,create_transcript,transcribe_image— viaasyncio.to_thread(a pattern already used across cognee, e.g.tasks/ingestion,shared/cache). This keeps the existing synchronous Ollama client but moves the blocking call off the event loop; no other behavior change.Test
Adds
cognee/tests/unit/infrastructure/llm/test_ollama_adapter.py, which spies onasyncio.to_threadand assertsacreate_structured_outputdispatches the (synchronous) client call through it. Reverting the offload fails the test.pytest cognee/tests/unit/infrastructure/llm/test_ollama_adapter.py→ 1 passed.(Happy to switch to
AsyncOpenAI+awaitto mirror thellama_cppadapter instead, if you prefer the true-async client over the thread offload.)