Promoting a comment on another issue to a full issue: the truly “economical” option is to not pay other companies for VC-subsidized tokens.
Several of your current leaderboard models do not run on reasonably-priced local hardware, but this is part of them being uneconomical. Surely a foundational goal here must be to guide the user to models whose cost will not balloon when their providers shift from marketshare-grabbing efforts toward profitability.
I am currently using the model you kicked off the list (Qwen 3.6 35B) locally, and it’s acceptable for many tasks. Before that I tried gpt-oss — which should be credited to OpenAI, by the way, not to a single one of the company’s investors, Microsoft — and found it too slow at 120B on my modest local setup and a bit too stupid at 20B. Initial testing with Gemma 4 26B (A4B) is quick enough and fairly smart, but I’m not sure it makes the grade for coding.
For the runner, I’ve had the greatest success with LM Studio, especially its newish llmster daemon mode. That got me over a hump where the more open and philosophically aligned Ramalama was broken on this hardware, but now I’m back to it following an OS upgrade that gives Ramalama the kernel features and driver versions it wanted. Ramalama is from the same group that brought you Podman, which by the way might be of aid in constructing strong sandboxes for your agent.
Open WebUI gives a better web UI than the llama-server one baked into Ramalama, but that’s good mainly for chat. One still needs a coding agent to act on local files.
A huge early consideration when adding this feature is that you should make API keys optional. OpenCode fights you on this, presuming everything must have a key, but when the model is running on a private LAN without hostile actors, running the model “naked” is perfectly fine. Let me hit Enter to give a null key, please.
Promoting a comment on another issue to a full issue: the truly “economical” option is to not pay other companies for VC-subsidized tokens.
Several of your current leaderboard models do not run on reasonably-priced local hardware, but this is part of them being uneconomical. Surely a foundational goal here must be to guide the user to models whose cost will not balloon when their providers shift from marketshare-grabbing efforts toward profitability.
I am currently using the model you kicked off the list (Qwen 3.6 35B) locally, and it’s acceptable for many tasks. Before that I tried gpt-oss — which should be credited to OpenAI, by the way, not to a single one of the company’s investors, Microsoft — and found it too slow at 120B on my modest local setup and a bit too stupid at 20B. Initial testing with Gemma 4 26B (A4B) is quick enough and fairly smart, but I’m not sure it makes the grade for coding.
For the runner, I’ve had the greatest success with LM Studio, especially its newish llmster daemon mode. That got me over a hump where the more open and philosophically aligned Ramalama was broken on this hardware, but now I’m back to it following an OS upgrade that gives Ramalama the kernel features and driver versions it wanted. Ramalama is from the same group that brought you Podman, which by the way might be of aid in constructing strong sandboxes for your agent.
Open WebUI gives a better web UI than the llama-server one baked into Ramalama, but that’s good mainly for chat. One still needs a coding agent to act on local files.
A huge early consideration when adding this feature is that you should make API keys optional. OpenCode fights you on this, presuming everything must have a key, but when the model is running on a private LAN without hostile actors, running the model “naked” is perfectly fine. Let me hit Enter to give a null key, please.