Skip to content

docs(benchmarks): note ponytail's cost is task-shaped in agentic runs (#121)#122

Open
Lakshya77089 wants to merge 1 commit into
DietrichGebert:mainfrom
Lakshya77089:bench-regime-note
Open

docs(benchmarks): note ponytail's cost is task-shaped in agentic runs (#121)#122
Lakshya77089 wants to merge 1 commit into
DietrichGebert:mainfrom
Lakshya77089:bench-regime-note

Conversation

@Lakshya77089

Copy link
Copy Markdown
Contributor

Addresses the documentation gap raised in #121.

The headline benchmark is single-shot completions, but a Notes bullet generalized "the cost gap widens further in ponytail's favor" to real sessions. The independent Cursor-SDK A/B in #121 shows that's not universal: on large completion-forced tasks, ponytail ON can raise tool calls and tokens (more reading before writing) while shrinking the written output — net cost can go either way. The cost win is clearest on snowball-prone / blocked tasks.

Changes (docs only)

No code change, so check-rule-copies.js and npm test are unaffected.

The headline benchmark is single-shot completions; a Notes bullet
generalized 'the cost gap widens further in ponytail's favor' to real
sessions. Independent multi-turn Cursor-SDK A/B (DietrichGebert#121) shows that on
large completion-forced tasks ponytail ON can raise tool calls and
tokens (more reading before writing) while shrinking output — net cost
either way; the win is clearest on snowball-prone/blocked tasks. Scope
the single-shot claim and add a 'saves vs. costs' section, plus the
skillCount-is-availability-not-usage clarification.
@DietrichGebert

Copy link
Copy Markdown
Owner

This is great and I want it merged with your name on it, the credit to Ricardo's benchmark repo is exactly right.

One snag: it conflicts now. Main already rewrote that "Cost reflects single-shot calls..." bullet under ## Notes (the later agentic-README/#115 work) to scope the claim and link #121, so your edit to that one line collides with it.

Easy fix on your end:

  1. Rebase on main.
  2. Drop your change to that single-shot cost bullet, main's version already covers it.
  3. Keep your "Where ponytail saves vs. costs" section as-is, the RicardoCostaGit repo link and the skillCount note are the parts I most want landed.
  4. Add Closes #121 to the PR description so the issue closes itself when this merges.

Ping me when it's pushed and I'll merge it. Thanks for the rigor on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants