docs(benchmarks): note ponytail's cost is task-shaped in agentic runs (#121) by Lakshya77089 · Pull Request #122 · DietrichGebert/ponytail

Lakshya77089 · 2026-06-16T17:23:05Z

Addresses the documentation gap raised in #121.

The headline benchmark is single-shot completions, but a Notes bullet generalized "the cost gap widens further in ponytail's favor" to real sessions. The independent Cursor-SDK A/B in #121 shows that's not universal: on large completion-forced tasks, ponytail ON can raise tool calls and tokens (more reading before writing) while shrinking the written output — net cost can go either way. The cost win is clearest on snowball-prone / blocked tasks.

Changes (docs only)

Scope the existing single-shot cost claim so it no longer over-generalizes to agentic runs.
Add a "Where ponytail saves vs. costs" section splitting the two regimes (snowball-prone/blocked → saves; large completion-forced → can cost more).
Add the clarification from SDK benchmark: Ponytail ON → more tool calls & cost, fewer output lines; skillCount ≠ skills read #121 that an SDK's startup skillCount is skills available in the workspace, not skills the model read — only a read of a SKILL.md is usage.
Credit and link @RicardoCostaGit's benchmark repo.

No code change, so check-rule-copies.js and npm test are unaffected.

The headline benchmark is single-shot completions; a Notes bullet generalized 'the cost gap widens further in ponytail's favor' to real sessions. Independent multi-turn Cursor-SDK A/B (DietrichGebert#121) shows that on large completion-forced tasks ponytail ON can raise tool calls and tokens (more reading before writing) while shrinking output — net cost either way; the win is clearest on snowball-prone/blocked tasks. Scope the single-shot claim and add a 'saves vs. costs' section, plus the skillCount-is-availability-not-usage clarification.

DietrichGebert · 2026-06-18T21:35:56Z

This is great and I want it merged with your name on it, the credit to Ricardo's benchmark repo is exactly right.

One snag: it conflicts now. Main already rewrote that "Cost reflects single-shot calls..." bullet under ## Notes (the later agentic-README/#115 work) to scope the claim and link #121, so your edit to that one line collides with it.

Easy fix on your end:

Rebase on main.
Drop your change to that single-shot cost bullet, main's version already covers it.
Keep your "Where ponytail saves vs. costs" section as-is, the RicardoCostaGit repo link and the skillCount note are the parts I most want landed.
Add Closes #121 to the PR description so the issue closes itself when this merges.

Ping me when it's pushed and I'll merge it. Thanks for the rigor on this one.

Lakshya77089 mentioned this pull request Jun 16, 2026

SDK benchmark: Ponytail ON → more tool calls & cost, fewer output lines; skillCount ≠ skills read #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(benchmarks): note ponytail's cost is task-shaped in agentic runs (#121)#122

docs(benchmarks): note ponytail's cost is task-shaped in agentic runs (#121)#122
Lakshya77089 wants to merge 1 commit into
DietrichGebert:mainfrom
Lakshya77089:bench-regime-note

Lakshya77089 commented Jun 16, 2026

Uh oh!

DietrichGebert commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Lakshya77089 commented Jun 16, 2026

Changes (docs only)

Uh oh!

DietrichGebert commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants