Skip to content

Support OpenAI Chat Completions tool_calls/role=tool in guideline trajectory parsing #223

@gaodan-fang

Description

@gaodan-fang

Summary

altk-evolve guideline generation currently parses OpenAI Agents SDK-style assistant function calls, but does not fully support OpenAI Chat Completions tool-calling transcript format.

Evidence (current main)

Parser: altk_evolve/llm/guidelines/guidelines.py (parse_openai_agents_trajectory)

Current behavior:

  1. Assistant tool calls are parsed only when message.role == "assistant" and message.content is a list with items of {"type":"function_call", ...}.
  2. Top-level Chat Completions field assistant.tool_calls is not parsed.
  3. role == "tool" messages (tool outputs keyed by tool_call_id) are not parsed into observations.
  4. If first user message content is non-string, parser raises (First user message was not a task instruction.), which is strict for modern message variants.

save_trajectory in altk_evolve/frontend/mcp/mcp_server.py passes the loaded messages directly to generate_guidelines(messages), so parser coverage determines what reaches guideline synthesis.

Why this matters

Many agents produce standard OpenAI Chat Completions transcripts:

  • assistant: {role:"assistant", content:"", tool_calls:[...]}
  • tool: {role:"tool", tool_call_id:"...", content:"..."}

With current parsing, these trajectories lose tool-action/observation information in guideline generation.

Requested change

Update parse_openai_agents_trajectory to support both formats:

  1. OpenAI Chat Completions format
  • Parse assistant.tool_calls into action steps.
  • Parse role="tool" messages into observation steps and associate by tool_call_id where possible.
  1. OpenAI Agents SDK format (existing)
  • Keep support for assistant.content list blocks with type="function_call".
  1. Compatibility and normalization
  • Accept mixed transcripts safely.
  • Be tolerant of dict/list/string content variants without hard-failing the whole trajectory.
  • Keep unknown/unsupported blocks non-fatal (skip + log), instead of raising for every unexpected content item.

Suggested acceptance criteria

  1. tests/unit/test_guidelines.py adds parser coverage for:
  • assistant top-level tool_calls (Chat Completions)
  • role="tool" observation messages
  • mixed agents-style + chat-completions-style transcripts
  • non-string first user message handling fallback
  1. generate_guidelines trajectory summary includes:
  • action lines for tool invocations
  • observation lines for tool outputs/errors
  1. No regression for existing agents-style test cases.

Notes

This keeps altk-evolve general across agents and avoids requiring agent-specific side channels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions