Curated list of free LLM APIs, coding copilots, AI IDEs, agents, and infrastructure tools for building real AI applications.
- β Free GPT-5 / Claude / Gemini API access
- π€ Coding copilots and AI-native IDEs (Cursor, Trae, Windsurf)
- π° Cheapest AI APIs ($0.08-0.50 per 1M tokens)
- π RAG stack tools (vector DBs, embeddings, frameworks)
- π― Agent frameworks and automation tools
- π Local models for privacy (Ollama, Llama, Qwen)
- ποΈ Production-ready stack configurations
- π Claude Opus 4.7/4.8, Sonnet 4.6, Haiku 4.5 β GitHub Copilot AI Credits β Windsurf Max β Trae Ultra β OpenCode 167kβ β Kiro Cloud Agent β Xiaomi MiMo V2.5 Pro
Goal: Help developers build AI apps without paying $200/month.
Note
Please don't abuse these services, else we might lose them for everyone. The numebr becomes 550+ when you add all the models and sub services of all the tools provided. When raising issues or pull requests please dont add your own paid,expensive personal projets.
Warning
April 2026 Model Tier Changes: Major providers (OpenAI, Anthropic, Google) have restricted flagship models (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) to paid tiers. Free tiers now get lighter models (GPT-4o, Claude Sonnet/Haiku, Gemini Flash). Entries marked with [verify] need confirmation.
June 2026 Pricing & Billing Updates: Windsurf switched to a quota-based model (Pro $20, Teams $40, new Max $200) on Mar 18. Trae moved to a 5-tier token system (Lite $3, Pro $10, Pro+ $30, Ultra $100) on Feb 24. Qoder's 50% launch promo ended Apr 30 β standard pricing is now Pro $20, Pro+ $60, Ultra $200. GitHub Copilot moved to usage-based billing (GitHub AI Credits) on Jun 1, with a new Max tier at $100. Anthropic added Claude Opus 4.7/4.8, Sonnet 4.6, and Haiku 4.5. Xiaomi MiMo V2.5 Pro API permanently cut 99% (May 26) β $0.435/$0.87 with $0.0036 cache.
Most AI tool lists are:
- β Outdated (prices/limits from 2023)
- β Filled with affiliate links and sponsored placements
- β General-purpose directories with no developer focus
- β Missing production-critical details (rate limits, commercial use, architecture patterns)
This repo focuses only on:
- β Tools developers actually use in production
- β Generous free tiers (no "5 requests then paywall")
- β Production-capable models (SWE-bench verified, not toys)
- β Real infrastructure (APIs, hosting, vector DBs, not just chatbots)
- β Minimal fluff, maximum utility
Unlike: awesome-ai (general list), ai-collection (marketing focus), toolify (affiliate-heavy)
This is for: Builders who want to ship AI features this week.
If this repo helped you build something or saved you money:
β Star this repo β it helps more builders discover free AI resources.
[π Share with your team] β spread the knowledge.
π Contribute β found a new free tier? Updated pricing? PRs welcome!
2026-06-16
- π Added OpenCode (167kβ OSS CLI), AWS Kiro (full spec-driven family), Xiaomi MiMo Token Plan (Chinese coding subscription)
- π§Ή Removed weak/no-longer-free items from Free LLM providers: Cohere (non-commercial only), GitHub Models (Copilot-required), SambaNova/Hyperbolic (trial-only), HuggingFace (~$0.10/mo), Vercel ($5/mo), Mistral Codestral, Together AI, iFlow (7-day key), Perplexity API
- π Updated Gemini CLI entry: 3.1 Pro is paid-only; 3 Flash is the free tier (1,500 req/day)
- π Pricing refresh: Windsurf (Mar 18), Trae (Feb 24), Qoder (Apr 30), GitHub Copilot (Jun 1) billing changes
- β Added GitHub Copilot Max tier ($100/mo, $200 AI Credits) and Claude Haiku 4.5
- π Fixed stale Cursor / Qoder / Windsurf / GitHub Copilot pricing throughout
2026-05-18
- β¨ added github PR review tools
2026-04-12
- β¨ added a website for easy navigation
2026-04-11
- β¨ Initial release
- Quick Comparison
- Free LLM API Providers
- AI-Powered IDEs
- CLI Coding Tools
- API Providers for AI Coding Tools
- Paid Tiers Comparison
- Local Models
- free-coding-models CLI
- Additional 2026 AI Tools
- ποΈ Recommended Stacks
- β‘ Realtime & Streaming APIs
- ποΈ Speech Models
- π¨ Image Generation Models
- π¬ Video Generation APIs
- π AI Browser Automation
- πΎ Cheap Vector DB Hosting
- ποΈ Common AI Architecture Patterns
- π΅ Model Price Comparison
- π― Best Models by Use Case
- β±οΈ Rate Limit Comparison
- β Commercial Use Summary
- π§© RAG Stack Tools
- π’ Best Free Embedding APIs
- π₯οΈ AI Hosting & GPU Providers
- π AI Evaluation Tools
- π Structured Output Tools
- π·οΈ Legend
- Contributing
- License
| Provider | Models | Free Tier | Credit Card |
|---|---|---|---|
| NVIDIA NIM | 46 | 40 req/min | No |
| OpenRouter | 25 | 50/day (1K/day with $10) | No |
| Groq | 20+ | 1K-14.4K req/day | No |
| Google AI Studio | 9 | 5-500 req/day | No |
| Cloudflare Workers AI | 47+ | 10K neurons/day | No |
| Cerebras | 4 | 1M tokens/day | No |
| Mistral La Plateforme | 10+ | 1B tokens/month | No |
| IDE | Pro-grade Models | Free Tier Limit | Credit Card |
|---|---|---|---|
| Cursor | GPT-5.1-Codex-Max | Limited free tier (Hobby) | No |
| Trae | DeepSeek V4, GPT-4.1 (Claude removed Nov 2025) | 5,000 auto-completions/month | No |
| Windsurf | OpenAI, Anthropic, Google, xAI | Light quota (daily/weekly) | No |
| Qoder | Qwen3.6-Plus, Qwen3-Coder-480B, Claude, GPT, Gemini | Unlimited completions + limited chat | No |
| Tool | Starting Price | Free Tier | Features | Credit Card |
|---|---|---|---|---|
| PrixAI | Free / $10 paid plan | Free trial available | Unlimited reviews Auto-fix PRs, issue planning | No |
| Bito | Free / $25 paid plans | Free trial available | AI PR reviews/Unlimited reviews | No |
| Sourcery | ~$12/month | Free trial available | Code quality reviews | No |
| Tool | Pro-grade Models | Free Tier Limit | Credit Card |
|---|---|---|---|
| Gemini CLI | Gemini 3 Flash | 1,500 req/day | No |
| Rovo Dev CLI | Claude Sonnet 4 [verify], GPT-5 preview [verify] | 5M tokens/day | No |
| Warp | GPT-4.1, Claude Opus 4.1 [verify] | 150 credits/month (first 2 mo), 75/mo after | No |
| GitHub Copilot | GPT-4.1, Claude Opus, Gemini | 50 chat + 2K completions/month [verify] | No |
| Jules | Gemini 2.5 Pro | 15 tasks/day | No |
| AWS Kiro | Claude Opus 4.7/4.8, Sonnet 4.5/4.6, Haiku 4.5 | 50 credits/month + 500 bonus | No |
| OpenCode | 75+ providers (BYOK) + Go bundle | Free (Zen) / Go $10/mo | No |
| Xiaomi MiMo | MiMo-V2.5-Pro, MiMo-V2.5, MiMo-V2-Omni | Free API credits | No |
| ForgeCode | 300+ models via OpenRouter | 10K tokens/day | No |
| Amazon Q Developer | Claude Sonnet 4 [verify] | 50 agentic req/month | Required |
| RooCode | Bring your own keys | Unlimited (BYOK) | No |
| Goose | Bring your own keys | Unlimited (BYOK) | No |
| OhMyPi | Bring your own keys | Unlimited (BYOK) | No |
Models achieving β₯60% on SWE-bench Verified:
| Model | SWE-bench | Provider |
|---|---|---|
| Claude Opus 4.6 | 84.2% | Anthropic |
| GPT-5.4 | 80.1% | OpenAI |
| Claude Sonnet 4.6 | 79.3% | Anthropic |
| Gemini 3.1 Pro | 77.4% | |
| Claude Opus 4.5 | 82.1% | Anthropic |
| Claude Opus 4.7 / 4.8 [verify] | ~85% [verify] | Anthropic |
| GPT-5.1-Codex-Max | 78.3% | OpenAI |
| Qwen3.6-Plus | 71.2% | Alibaba |
| Claude Sonnet 4.5 | 77.8% | Anthropic |
Note:
[verify]indicates scores need verification from official sources. Always check current benchmarks before making decisions.
Ready-made combinations for different use cases. Copy-paste these configurations.
| Layer | Tool | Why |
|---|---|---|
| IDE | Cursor Hobby / Qoder | Limited completions + chat |
| CLI | Gemini CLI (3 Flash) / Rovo Dev | 1,500 req/day Flash, 5M tokens/day Rovo |
| API | OpenRouter + Groq | 50 req/day + 14.4K req/day combo |
| Local | Ollama + Qwen3.6-Plus | Unlimited offline |
| Automation | n8n Self-hosted | Unlimited workflows |
| Vector DB | ChromaDB / LanceDB | Free local storage |
Total Cost: $0/month
| Layer | Tool | Speed |
|---|---|---|
| Inference | Groq / Cerebras | 2,000 tokens/sec (Cerebras) |
| Coding | Qwen3.6-Plus via Groq | 1,000 req/day (71.2% SWE) |
| Agent | OpenCode Zen | Big Pickle (72.0%), MiniMax M2.5 (80.2%) |
| Cache | DeepSeek V4 | $0.30/$0.50 per 1M, 90% cache discount |
| Edge | Cloudflare Workers AI | Global CDN |
Best for: Real-time apps, trading bots, live coding assistants
| Layer | Tool | Cost |
|---|---|---|
| IDE | Trae Lite | $3/mo ($5 basic usage + bonus) |
| IDE | Trae Pro | $10/mo ($20 basic usage + bonus, SOLO mode) |
| API | OpenRouter $10 | 1K req/day + BYOK 1M/month free |
| CLI | OpenCode | Free (BYOK) or Go $10/mo |
| CLI | Xiaomi MiMo Lite | $6/mo (60M credits, ~120 tasks) |
| CLI | Gemini CLI | v0.37.1 (Gemini 3.1 Pro/Flash) |
| Local | Ollama | Free |
| Embeddings | Jina AI | Free tier |
Total Cost: ~$10/month for pro-grade everything
| Layer | Tool | Privacy |
|---|---|---|
| Models | Ollama + Llama 3.3 / Qwen3-Coder | Runs locally |
| IDE | Continue.dev + VS Code | BYO local models |
| CLI | Aider + local Ollama | Git-integrated, offline |
| Chat UI | Open WebUI | Self-hosted ChatGPT alternative |
| Vector DB | ChromaDB / LanceDB | Local embeddings storage |
| Speech | Whisper (local) | Offline transcription |
Best for: Healthcare, legal, finance - any sensitive data
| Component | Tool | Role |
|---|---|---|
| Orchestrator | n8n / Gumloop | Workflow automation |
| Reasoning | DeepSeek R1 / DeepSeek V4 | Complex decision making |
| Execution | Qwen3.6-Plus | Code generation |
| Memory | ChromaDB / Supabase Vector | Long-term context |
| Embeddings | Jina Embeddings v3 (1M tokens/day free) | Semantic search |
| Monitoring | LangSmith | Trace agent steps |
Best for: Autonomous research assistants, code review bots, data processing pipelines
| Component | Tool | Purpose |
|---|---|---|
| Framework | LlamaIndex / LangChain | RAG orchestration |
| Vector DB | ChromaDB / Weaviate / Supabase | Document storage |
| Embeddings | E5-Mistral-7B (best accuracy) | Text vectorization |
| Chunking | LlamaIndex | Smart document splitting |
| Reranking | Cohere Rerank | Improve retrieval accuracy |
| LLM | Claude Sonnet 4.6 (79.3%) / GPT-5.4 | Answer generation |
| Eval | RAGAS | Measure RAG performance |
Best for: ExamAi, legal document analysis, knowledge bases
Limits: 20 RPM, 29 free models (262K context max, March 2026), models share quota
- Llama 3.3 70B β
- NEW: Nemotron 3 Super (262K context)
- NEW: MiniMax M2.5
- NEW: Devstral 2 (Apache 2.0)
- NEW: Gemma 3n family (mobile-optimized)
- qwen/qwen3.6-plus:free β
- Hermes 3 Llama 3.1 405B
- Llama 3.2 3B Instruct
- Mistral Small 3.1 24B
- Full list
Unified API gateway for 100+ LLMs. OpenAI and Anthropic SDK-compatible. China-friendly with Hong Kong direct access (100-300ms latency). No monthly fees, pay per token.
Limits: Not published | 1 free model
- GLM-4.7-Flash (200K context, 128K output, $0/M input, $0/M output)
Data is used for training when used outside UK/CH/EEA/EU.
Rate limits: Tier 1 (default): 250 RPD | Tier 2: Requires $250 spend + 30 days
| Model | Free Tier Limits |
|---|---|
| Gemini 3.1 Pro [verify: now paid] | 250 RPD (Tier 1) |
| Gemini 3 Flash | 1,500 RPD |
| All others | Check console |
Note: Data training outside UK/CH/EEA/EU still applies.
Phone number verification required. Models tend to be context window limited.
Limits: 1K credits signup, up to 5K total, 40 RPM (phone verify required)
- 46+ models including Llama 3.3 70B, Llama 4 Scout, Mistral Large, Qwen3 235B
Free tier requires opting into data training; phone verification required
Limits (per-model): 1 req/s, 500K tokens/min, 1B tokens/month
- Open and Proprietary Mistral models (Mistral Large 3, Small 3.1, etc.)
Routes to various supported providers.
Limits: $5/month
AI gateway with curated models. Free models may use data for improvement.
- Big Pickle Stealth (S+, 72.0% SWE-bench)
- MiniMax M2.5 Free (S+, 80.2% SWE-bench)
- MiMo V2 Pro/Omni/Flash Free
- Nemotron 3 Super Free
- GPT 5 Nano
- Trinity Large Preview Free
| Model | Limits |
|---|---|
| GPT-OSS 120B | 30 req/min, 60K tokens/min, 900 req/hour, 1M tokens/day |
| Llama 3.1 8B | Same limits as above |
| Qwen3-235B | Available via API |
| Model | Limits |
|---|---|
| Llama 3.1 8B | 14,400 req/day, 6K tokens/min |
| Llama 3.3 70B | 1,000 req/day, 12K tokens/min |
| Llama 4 Maverick/Scout | 1,000 req/day |
| Whisper Large v3/v3 Turbo | 7,200 audio-sec/min, 2,000 req/day |
| Qwen3-32B | 1,000 req/day, 6K tokens/min |
| Kimi K2 Instruct | 1,000 req/day, 10K tokens/min |
| GPT-OSS 20B/120B | 1,000 req/day, 8K tokens/min |
| And 15+ more |
Limits: 10,000 neurons/day
- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/openai/gpt-oss-120b, @cf/openai/gpt-oss-20b
- @cf/qwen/qwen3-30b-a3b-fp8
- @cf/zai-org/glm-4.7-flash
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base/Instruct (AWQ)
- Deepseek Math 7B Instruct
- Gemma 2B/3 12B/7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 7B/13B Chat (FP16/INT8/AWQ/LoRA)
- Llama 3 8B Instruct, Llama 3.1 8B Instruct (AWQ/FP8)
- Llama 3.2 1B/3B/11B Vision Instruct
- Llama 3.3 70B Instruct (FP8), Llama 4 Scout Instruct
- Mistral 7B Instruct v0.1/v0.2 (AWQ/LoRA)
- Mistral Small 3.1 24B Instruct
- Qwen 1.5 0.5B/1.8B/7B/14B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct, Qwen QwQ 32B
- Phi-2, SQLCoder 7B 2
- And more...
| Provider | Credits | Duration | Notes |
|---|---|---|---|
| Fireworks | $1 | Permanent | Various open models |
| Baseten | $30 | Permanent | Pay by compute time |
| Nebius | $1 | Permanent | Various open models |
| Novita | $0.50 | 1 year | Various open models |
| AI21 | $10 | 3 months | Jamba family |
| Upstage | $10 | 3 months | Solar Pro/Mini |
| NLP Cloud | $15 | Permanent | Phone verification required |
| Alibaba Cloud | 1M tokens/model | 90 days | Qwen models |
| Modal | $5-30/month | Monthly | Pay by compute time |
| Inference.net | $1 (+$25 on survey) | Permanent | Various open models |
| Hyperbolic | $1 | Permanent | DeepSeek, Llama, Qwen, GPT-OSS |
| SambaNova Cloud | $5 | 3 months | Llama, Qwen, DeepSeek |
| Scaleway | 1M tokens | Permanent | DeepSeek, Llama, Mistral, Gemma |
| Provider | Models | Free Tier | Environment Variable |
|---|---|---|---|
| ZAI | 7 | Free tier (generous quota) | ZAI_API_KEY |
| SiliconFlow | 6 | 1K RPM, 50K TPM | SILICONFLOW_API_KEY |
| OVHcloud AI Endpoints | 8 | 2 req/min (no key), 400 RPM with key | OVH_AI_ENDPOINTS_ACCESS_TOKEN |
| Chutes AI | 4 | Free community GPU-powered | CHUTES_API_KEY |
| DeepInfra | 4 | 200 concurrent requests | DEEPINFRA_API_KEY |
| Replicate | 2 | 6 req/min (no payment), up to 3K RPM with payment | REPLICATE_API_TOKEN |
Full-featured integrated development environments with built-in AI assistance.
Model: GPT-5.1-Codex-Max (77.9% SWE-bench Verified) [verify]
- Free tier (Hobby): Limited Agent requests + Limited Tab completions/month + 1-week Pro trial
- Free models: Cursor Small, Deepseek v3, Gemini 2.5 Flash, GPT-4o mini (500/day limit), Grok 3 Mini Beta [verify: GPT-5.4 now paid-only]
- Claude models removed from free tier ~June 2025
- Credit-based billing since Jun 2025: each paid plan includes a credit pool equal to its price; Tab completions unlimited, Auto mode effectively unlimited, credits only deplete when you manually pick a premium model
- AI-powered code editor with autonomous coding capabilities
- Pro ($20/mo or $16/mo annually): $20/mo credit pool + Unlimited Tab completions + Auto mode
- Pro+ ($60/mo or $48/mo annually): $60/mo credit pool + 3x Pro usage + Background Agents
- Ultra ($200/mo or $160/mo annually): $400/mo credit pool (20x Pro) + Priority access
- Teams ($40/user/mo or $32/user/mo annually): Pro-equivalent per seat + Centralized billing + Usage analytics + SAML/OIDC SSO
- Enterprise (Custom): Everything in Teams + Pooled usage + SCIM + AI code tracking API + Audit logs
- Bugbot add-on: $40/user/month (Pro/Teams) β automated PR review
Pricing | GPT-5.1-Codex-Max Announcement
Models: DeepSeek V4, GPT-4.1, GPT-4o, Gemini 2.5 Pro (Claude models removed Nov 2025)
- New token-based pricing (effective Feb 24, 2026) β replaced the legacy "fast/slow request" model
- Free: Limited usage, 5,000 auto-completions/month, Standard queue
- Lite ($3/mo): $5 basic usage + bonus, Unlimited auto-completions
- Pro ($10/mo): $20 basic usage + bonus, Unlimited auto-completions, SOLO mode included, 10 concurrent cloud tasks
- Pro+ ($30/mo): $90 basic usage + bonus (4.5x Pro), 15 concurrent cloud tasks
- Ultra ($100/mo): $400 basic usage + bonus, Model early access, 20 concurrent cloud tasks
- 7-day free Pro trial (replaces the legacy $3 first-month deal)
- Annual: Pro $90/yr (
$7.5/mo), Pro+ $270/yr ($22.5/mo), Ultra $900/yr (~$75/mo) - On-Demand Usage: pay-as-you-go at API rates after basic + bonus usage is exhausted
- Migration bonus: $20 in dollar usage for current Pro users who manually switch (valid 90 days)
Models: OpenAI, Anthropic, Google, xAI model access
- New quota-based pricing (effective Mar 19, 2026) β replaced the legacy "prompt credits" model
- Daily + weekly usage allowance instead of monthly credit pool
- Existing paid subscribers are grandfathered at the old price but moved to the new quota system (with a free extra week to try it)
- Free ($0): Light quota + Unlimited Tab completions + 1 app deploy/day
- Pro ($20/mo): Standard quota + Full model access (Opus 4.6, GPT-5.4, Sonnet 4.6) + Purchase extra usage at API price
- ~7-27 messages/day on Premium Plus models (Opus 4.6, GPT-5.4, GPT-5.3-Codex)
- ~8-101 messages/day on Premium models (Sonnet 4.6, GPT-5.2, Gemini Pro)
- Max ($200/mo) β NEW Mar 2026: Heavy quota (~6x Pro) + Priority support
- ~42-170 messages/day on Premium Plus models
- ~291-1,190 messages/day on Lightweight models (Haiku, Flash)
- Teams ($40/user/mo): Standard quota per seat + Centralized billing + Admin dashboard + Priority support
- Enterprise ($60+/user/mo): Custom volume + SSO + Audit logs
Pricing | Pricing Announcement (Mar 18, 2026)
Models: Multi-agent (frontend/backend/testing agents)
- Agent-first IDE - new 2026 category
- Multiple specialized agents coordinate across codebase
- Free preview tier with high usage limits
- VS Code-based
Best for: Full-stack development with natural language direction
Models: Qwen3.6-Plus (71.2% SWE), Qwen-Coder-Qoder, GPT-4o, Claude Sonnet [verify: flagship models now paid-only]
- Free tier: Unlimited completions + limited chat/agent (basic models) + 2-week Pro trial (1,000 credits)
- Experts Mode: Multi-agent collaboration (new Mar 2026)
- Quest Mode: Fully autonomous app building
- Nextnew: Tab predictions
- Windows/macOS, VS Code-based
- 50% launch promo ended Apr 30, 2026 β now back to standard pricing
Pricing (standard, post-promo β effective Apr 30, 2026):
- Free: Basic models, limited messages
- Pro: $20/mo β 2,000 credits
- Pro+: $60/mo β 6,000 credits
- Ultra: $200/mo β 20,000 credits
- Teams: $40/seat/mo (was $30, +1,000 credits) β 3,000 credits/seat (was 2,000)
- Personal Add-on Credits: $20 for 1,000 credits (was $10)
- Credits: $0.02/credit, expire 1mo
- Teams new capabilities (rollout): BYOK, Security controls over MCP/Skills, Plugin management, Knowledge Engine
Docs | Pricing | Adjustment Notice
Models: Bring your own API keys (any provider)
- Open-source AI-powered coding assistant for VS Code
- Whole dev team of AI agents in your editor
- No subscription required - pay-as-you-go with your own keys
- Custom modes for different coding tasks
Model: Base model (Llama 3.1 70B), pro-grade models require subscription
- Individual plan: Free forever with unlimited code completions, AI chat, commands
- 70+ programming languages supported
- IDE integrations: VS Code, JetBrains, Vim/Neovim, Jupyter
- No credit card required
- Limited context awareness (expanded in paid tiers)
- Pro ($10/mo): Unlimited usage with advanced context awareness, Claude 3.5 Sonnet, GPT-4o access
- Teams ($12/user/mo): Pro features + team management
- Enterprise (Custom): On-premise deployment, custom models
Models: Local models + cloud models with limited quota
- AI Free tier included with IDEs
- Unlimited code completion and local model support
- Limited quota for cloud-based features
- 30-day AI Pro trial included
- Offline mode with local models via Ollama/LM Studio
- AI Pro ($15/mo): Increased cloud quota + unlimited local models
- AI Ultimate ($25/mo): Maximum cloud quota + advanced features
Models: Claude 3.5 Sonnet, GPT-4o, Llama 3.3 70B, proprietary models
- Free tier with limited features
- Basic AI code completions and chat (limited)
- Local processing available
- Context heavily limited in free tier
- 600+ programming languages supported
- Pro ($12/mo): Enhanced AI completions and chat
- Enterprise ($39/user/mo): Multiple LLMs, private deployment, on-premises and air-gapped options
SuperMaven β οΈ DISCONTINUED
Status: Shut down November 21, 2025 after acquisition by Cursor (Nov 2024)
Models: GPT-4o, Claude 3.5 Sonnet, GPT-4 (via chat interface)
- Free tier with basic features
- Basic code suggestions
- 7-day data retention limit
- Credit card required for registration
- 1M token context window
Historical Note: SuperMaven was acquired by Cursor in November 2024 and officially shut down in November 2025. Features were integrated into Cursor Tab. Users should migrate to Cursor or alternatives.
Models: Unspecified models
- $1 credit/mo = ~100K tokens (reduced Mar 2026)
- Specific model not publicly specified
- Credit card required
- $20/mo: 20M tokens/month
- $200/mo: 200M tokens/month
Models: Unspecified models
- 5 daily credits, max 30 per month (free)
- Models not publicly enumerated
- Credit card required
- Pro ($25/mo): 150 credits/month (5 daily credits)
- Teams ($30/mo): Higher limits (undisclosed)
Models: Proprietary models (not frontier)
- $5 in credits/month limit
- Uses proprietary models with varied routing
- Credit card required
- GPT-5 access requires v0 Premium subscription
General-purpose chat interfaces with free tiers.
| Platform | Free Model | Key Capabilities | Limitations |
|---|---|---|---|
| ChatGPT | GPT-4o / GPT-5.4-limited [verify] | Sora 3, DALL-E 4, GPT Store | ~20 msgs/3hr |
| Gemini | Gemini 3.1 Flash | 2M Context, 20 Deep Research/mo | Research quota |
| Claude | Claude Sonnet/Haiku [verify: Opus paid-only] | Technical reasoning | ~30 msgs/5h |
| Grok | Grok 4.2 | Aurora 2 images, voice | 15 msgs/12hr |
| Mistral Le Chat | Mistral Medium 3 | Structured output | Fewer integrations |
Notes:
- Aurora - xAI's image generation model (available in Grok)
- Sora 2 - OpenAI's video generation (integrated in ChatGPT)
- DALL-E 4 - OpenAI's latest image model (ChatGPT)
- Deep Research - Gemini's agentic research feature
Command-line tools for AI-assisted coding in your terminal.
Models: Gemini 3.1 Flash [verify: Pro now paid], Gemini 2.5 Pro
- Gemini 3.1 Pro latest version (v0.37.1 April 2026)
- 100 requests/day for Gemini 2.5 Pro (free tier fallback)
- 250 requests/day for Gemini 2.5 Flash
- No credit card required for free tier
- MCP server support, Google Search grounding
- Enable via
/settingsβ Preview features β true - Install:
npm install -g @google/gemini-cli
Rate Limits | Pricing | Gemini 3 Pro Announcement
Important
Rovo Dev CLI isnβt available during a Rovo Dev Standard trial. To use this feature, you need a paid Rovo Dev Standard subscription.
Models: Claude Sonnet 4 [verify], GPT-5 preview [verify]
- 5M tokens/day free tier
- No credit card required during beta
- Token limits reset at midnight UTC
- Jira/Confluence integration, MCP server support
- Requires Atlassian account
- Pro ($19.99/mo via Google AI Pro): 100 tasks/day, 5x higher limits, 5x concurrent tasks (15)
- Ultra (via Google AI Ultra): 300 tasks/day, 20x higher limits, 60 concurrent tasks, priority access to latest models
Models: GPT-4.1, Claude Opus 4.1 [verify], Claude Sonnet 4 [verify], Gemini 2.5 Pro
- 150 AI credits/month (first 2 months), then 75 AI credits/month
- No credit card required for basic signup
- AI-powered terminal with code generation
- Build ($20/mo): 1,500 AI credits/month
- Reload Credits available (up to 50% cheaper than old overage rates, roll over for 12 months)
- Bring Your Own API Key (BYOK) option available
- New pricing effective immediately for new customers (Oct 30, 2025)
- Existing monthly subscribers transition on first renewal after Dec 1, 2025
167k+ GitHub stars β’ 850+ contributors β’ 6.5M monthly users β’ Apache 2.0
Models: 75+ providers via BYOK β Anthropic, OpenAI, Google, Groq, AWS Bedrock, Azure, OpenRouter, local Ollama
- MIT/Apache 2.0 licensed β fork, customize, self-host
- Five agent modes (Tab-switchable): Build (full tools), Plan (read-only), Debug, Review, Docs
- LSP-driven self-correction β auto-spawns Language Server Protocol servers and feeds compiler diagnostics back to the model (unique among agentic CLIs)
- Multi-agent support: up to 10 parallel agents per workspace
- Desktop app (beta): macOS, Windows, Linux
- IDE extensions: VS Code and forks
- Local inference via Ollama: $0 β no data leaves your machine
- Latest release: v1.15.12 (May 28, 2026)
OpenCode Go (recommended for getting started): Subscription bundle of curated open-weight models
- $5 first month, then $10/mo (beta)
- Models included: GLM-5, Kimi K2.5, MiniMax M2.5, MiniMax M2.7, DeepSeek V4 Pro/Flash, Qwen3.7 Max, GLM-5.1, MiMo-V2.5-Pro
- Usage limits: $12/5h, $30/week, $60/month (cheaper models = more requests)
- ~78% slower than Claude Code on identical tasks but more thorough
- "Use balance" option falls back to your Zen credits when limits are hit
OpenCode Zen: Pay-per-request credits (PAYG from $20)
Install: curl -fsSL https://opencode.ai/install | bash β’ brew install opencode β’ npm install -g opencode-ai
GitHub | OpenCode Go Docs | Model Hub
Models: GPT-4.1, Claude Opus 3.5, Gemini 2.0 Flash, Grok Code Fast 1 (Free tier); GPT-5.1-Codex-Max available in Pro/Pro+/Max/Business/Enterprise only
- MAJOR: Usage-based billing effective Jun 1, 2026 β premium request units (PRUs) replaced by GitHub AI Credits (token-based)
- Sign-ups for Pro, Pro+, and Student temporarily paused since Apr 20, 2026 (existing customers unaffected)
- 50 agent mode or chat requests + 2,000 completions/month (Free tier)
- Agent Mode with autonomous multi-step coding
- No credit card required for Free
- Free Copilot Pro for students/educators (GitHub Student Pack, Copilot Pro for teachers/maintainers)
- Code completions and Next Edit suggestions remain included on all plans and do not consume AI Credits
- Pro ($10/mo): $15 monthly AI Credits + unlimited completions + cloud agent
- Pro+ ($39/mo): $70 monthly AI Credits + 1,500 premium req equivalent + Opus access
- Max ($100/mo) β NEW Jun 2026: $200 monthly AI Credits + Priority access to new models + 2.9x Pro+ usage
- Business ($19/user/mo): $19 in AI Credits (promo: $30 in Jun/Jul/Aug 2026) + unlimited completions
- Enterprise ($39/user/mo): $39 in AI Credits (promo: $60 in Jun/Jul/Aug 2026) + unlimited completions
- GPT-5.1-Codex-Max available in public preview (Dec 4, 2025) for Pro, Pro+, Max, Business, Enterprise - NOT in free tier
- Copilot code review now also consumes GitHub Actions minutes (in addition to AI Credits)
- Overage billing available at $0.04/request equivalent (token-based)
Plans Details | Usage-Based Billing Announcement (Apr 27, 2026) | Agent Mode
Model: Gemini 2.5 Pro
- 15 tasks/day free tier
- 3 concurrent tasks
- No credit card required
- Gmail account required (18+ years)
- Task limits reset on rolling 24-hour window
- Pro ($19.99/mo): 100 tasks/day, 5x higher limits, 5x concurrent tasks (15)
- Ultra (via Google AI Ultra): 300 tasks/day, 20x higher limits, 60 concurrent tasks, priority access to latest models
Usage Limits | Documentation | Google AI Plans
AWS's spec-driven agentic IDE and CLI β official replacement for Amazon Q Developer (EOL Apr 30, 2027; new signups stopped May 15, 2026)
Three Kiro products, one engine:
- Kiro IDE β desktop app built on Code OSS (VS Code foundation)
- Kiro CLI β terminal-driven workflows, CI/CD, headless automation (CLI 2.0 since Apr 2026)
- Kiro Cloud Agent β fully autonomous cloud version for delegating work via web interface
Models (all AWS Bedrock-hosted): Claude Opus 4.7 / 4.8 (experimental, May 2026), Claude Sonnet 4.5 / 4.6, Claude Haiku 4.5
- 50 credits/month (Free tier)
- 14-day welcome bonus: 500 credits
- No credit card required for Free
- Pro ($20/mo): 1,000 credits
- Pro+ ($40/mo): 2,000 credits
- Power ($200/mo): 10,000 credits
- $0.04/credit overage rate
- Spec-driven development:
requirements.mdβdesign.mdβtasks.mdin.kiro/specs/ - Agent Hooks (event-driven automation) + Steering Files (project-wide rules) + Powers
- IAM Policy Autopilot + native AWS MCP Server integration
- Built on Amazon Bedrock AgentCore + CloudFormation/CDK awareness
- v0.12 (May 2026) added parallel task execution
- Spec requests priced at $0.20/each, vibe requests at $0.04 (revised Aug 2025 β see pricing controversy)
- Enterprise tier comparison: Pro 1,000 / Pro+ 2,000 / Power 10,000 credits (opt-in overage)
Pricing | Kiro CLI Docs | Introduction Blog | Migration from Q Developer
Xiaomi's subscription plan for AI coding scenarios β bundled access to MiMo flagship models Compatible with OpenCode, OpenClaw, Claude Code, and other mainstream toolchains
Models: MiMo-V2.5-Pro, MiMo-V2.5, MiMo-V2.5-TTS, MiMo-V2-Omni, MiMo-V2-Pro, MiMo-V2-TTS (8 models total)
- No context-length multiplier β same rate for 10K or 500K context (big deal for agentic workflows)
- 1:2 credit ratio for Pro vs Omni models (consumed in parallel, not independently)
- TTS models free for limited time (do not consume package tokens)
- Night discount: 0.8x consumption (00:00β08:00 Beijing Time, 16:00β24:00 UTC)
- First-purchase discount: 12% off (one-time per account)
- First auto-renewal: 23% off (new) / 30% off (existing) β mutually exclusive with first-purchase
- Continuous annual subscription: 12% discount
Monthly Pricing:
| Tier | Price (USD) | Price (CNY) | Monthly Credits | ~Tasks/mo |
|---|---|---|---|---|
| Lite | $6/mo | Β₯39/mo | 60M | ~120 medium-complexity |
| Standard | $16/mo | Β₯99/mo | 200M | ~400 |
| Pro | $50/mo | Β₯329/mo | 700M | ~1,400 |
| Max | $100/mo | Β₯659/mo | 1.6B (was 1.6B; Max upgraded to 82B credits May 26, 2026 β 51x increase) | ~3,200 (or ~160,000+) |
Annual pricing: ~$63.36/yr (Lite), $168.96/yr (Standard), $528/yr (Pro), $1,056/yr (Max) β all with 12% annual-subscriber discount
API Pricing (permanently reduced 99% on May 26, 2026):
| Model | Input (per 1M) | Output (per 1M) | Cache Hit (per 1M) |
|---|---|---|---|
| MiMo V2.5 Pro | $0.435 | $0.87 | $0.0036 |
| MiMo V2.5 Standard | $0.20 | $0.60 | $0.002 |
Install: Get API key at platform.xiaomimimo.com β OpenAI-compatible endpoint at https://api.xiaomimimo.com/v1, model tag mimo-v2.5-pro
Subscription Docs | V2.5 Pro API Guide | BuyGLM Review (Apr 2026)
Model: Claude Sonnet 4 [verify] (AWS-hosted)
- 50 agentic requests/month limit (multi-turn conversations)
- Latest Claude models
- Credit card required
- Must upgrade to Pro for continued access
- Perpetual free tier
- Pro ($19/mo): Increased limits for agentic requests
- Business ($25/user/mo): Organization-wide admin + security
- Per-request overage: $0.003/code-generation request after 1,000 included
- End-of-support Apr 30, 2027 β replaced by AWS Kiro (new signups stopped May 15, 2026)
- Usage may be adjusted based on regional factors and usage patterns
See OpenCode CLI above for the full reference. Briefly: open-source (Apache 2.0) terminal agent, 75+ providers via BYOK, Go tier $5 first month / $10/mo.
Models: 300+ models via OpenRouter (Claude, GPT, O Series, Grok, DeepSeek, Gemini)
- AI-enabled pair programmer (Rust-based, Apache 2.0)
- Model-agnostic agent harness
- Semantic codebase search via
:sync - 10K tokens/day free tier
Models: Bring your own keys (any provider)
- AI coding agent for the terminal (Zig-powered)
- Hash-anchored edits, optimized tool harness
- LSP integration, Python support, browser automation
- Subagents with coordinated API rate limiting
- Multiplexer integration (tmux, GNU Screen, Zellij)
- Interrupt anytime workflow
Models: Any LLM (Claude, GPT, DeepSeek, etc.)
- Open-source extensible AI agent from Block (now AAIF/Linux Foundation)
- Desktop app, CLI, and API
- Active engineering tasks (not just code suggestions)
- Built for code, workflows, and automation
- Model-agnostic architecture
Models: Bring your own API keys (Claude, Gemini, GPT, etc.)
- Up to $25 signup credits (one-time bonus)
- Open source VS Code extension
- Pay-as-you-go with no markup on model pricing
- Credit card required to claim full bonus credits
- Full BYOK support
GitHub | Documentation | Pricing
Models: Bring your own API keys (any provider)
- Open-source AI-powered coding assistant for VS Code
- Whole dev team of AI agents in your editor
- No subscription required - pay-as-you-go with your own keys
- Custom modes for different coding tasks
- Previously known as Roo Cline
Models: Claude Sonnet 4 [verify], Opus 4.5 [verify: paid-only], Haiku 4.5
- Free tier available with limited usage
- Pro ($20/mo or $17/mo annually): Sonnet 4 access with more usage
- Max 5x ($100/mo): ~225 messages/5 hours
- Max 20x ($200/mo): ~900 messages/5 hours
- Extended thinking modes: "think" (~4K tokens), "megathink" (~10K), "ultrathink" (~32K)
- Usage limits reset weekly with 5-hour rolling windows
Model: GPT-5.1-Codex-Max (77.9% SWE-bench Verified)
- Free with ChatGPT Plus ($20/mo): 30β150 messages/5 hours
- ChatGPT Pro ($200/mo): 300β1,500 messages/5 hours
- Pay-as-you-go API: $1.25/$10 per million tokens (input/output)
- Free OSS mode: Access to open-source models only (via
--ossflag) - First model with "compaction" for multi-million token sessions (24+ hour tasks)
- 30% fewer thinking tokens than previous GPT-5.1-Codex
- Cross-platform: macOS 12+, Ubuntu 20.04+, Windows 11 via WSL2
GitHub Repo | GPT-5.1-Codex-Max Announcement
Models: Uses Claude Code for implementation
- Autonomous AI development pipeline β #1 Terminal Benchmark 2.0
- Turns GitHub issues into pull requests automatically
- Label an issue "pilot" β Pilot claims it β Creates branch β Plans β Implements β Quality gates β Opens PR
- Telegram bot integration available
- Desktop app available
- Install:
brew install qf-studio/tap/pilotorgo install github.com/qf-studio/pilot@latest
Models: Works with any LLM (Claude, ChatGPT, Cursor, Gemini, local models)
- AI memory system with highest LongMemEval score ever (96.6%)
- Uses ancient "memory palace" technique for AI conversations
- Stores conversations in structured format: wings (people/projects), halls (memory types), rooms (specific ideas)
- Raw verbatim storage without AI summarization
- Three mining modes: projects (code/docs), convos (conversation exports), general (auto-classified)
- MCP server with 19 tools for AI integration
- Local, open, adaptable β runs entirely on your machine
- Install:
pip install mempalace
Models: Bring your own API keys (200+ models supported)
- Free VS Code and JetBrains extension
- Full support for local models via Ollama/LM Studio
- Solo tier: Private/team/public visibility options
- Community hub for custom AI assistants
- No vendor lock-in or usage limits for local models
Models: Bring your own API keys (supports many providers)
- Free command-line assistant with built-in Git integration
- Works with GPT-4o, Claude Sonnet, DeepSeek, and local models
- Multi-file editing with repository context
- Voice-to-code support
- Use
/helpto see all commands
These services provide API access to coding-optimized models for tools like Cursor, Continue.dev, Cline, etc.
- 50 requests/day free tier (1,000/day with $10+ credits)
- Qwen3-Coder-480B, Qwen3-30B-A3B, Qwen3-235B-A22B, Gemini Flash
- 20 req/min rate limit for free tier
- OpenAI-compatible API
- 1.5M tokens/day free tier (expanded Feb 2026)
- 30 req/min, 8,192 token context
- Models: Qwen3.6-Plus-480B, Llama 3.1 70B
- Ultra-fast: 2,400 t/s (Qwen3.6)
- OpenAI-compatible API (works with Cursor, Continue.dev, Cline, RooCode, etc.)
- Paid tiers: Developer ($10+ self-serve), Enterprise (custom pricing)
Pricing | API Docs | Integrations
| IDE | Entry Tier | Credits/Requests | Key Features |
|---|---|---|---|
| Cursor | Pro ($20/mo) | $20/mo credit pool | Unlimited completions, Auto mode |
| Trae | Lite ($3/mo) / Pro ($10/mo) | $5 / $20 basic usage + bonus | SOLO mode, 5-tier token system |
| Windsurf | Pro ($20/mo) | Standard quota (daily/weekly) | Multi-provider, Max $200 tier |
| Qoder | Pro ($20/mo) | 2,000 credits | Quest Mode, Experts Mode |
| Codeium | Pro ($10/mo) | Unlimited | Claude 4.6 [verify], GPT-5.4 [verify] |
| Tabnine | Pro ($12/mo) | Enhanced completions | 600+ languages |
| JetBrains AI | AI Pro ($15/mo) | Increased cloud quota | Unlimited local models |
| Tool | Entry Tier | Credits/Requests | Key Features |
|---|---|---|---|
| Claude Code | Pro ($20/mo) | ~225 messages/5h | Sonnet 4.6 + Opus 4.6 [verify] |
| Warp | Build ($20/mo) | 1,500 credits/month | BYOK available |
| GitHub Copilot | Pro ($10/mo) | $15 monthly AI Credits | Usage-based billing since Jun 1, 2026 |
| OpenCode | Go ($10/mo) | $12/5h, $30/wk, $60/mo | Apache 2.0, 75+ providers, BYOK |
| AWS Kiro | Pro ($20/mo) | 1,000 credits | Spec-driven dev, replaces Q Developer |
| Xiaomi MiMo | Lite ($6/mo) | 60M credits | OpenCode/Claude Code compatible |
| Rovo Dev CLI | Jira Standard ($7.53/mo) | 20M tokens/day | 4x free tier |
| Jules | Pro ($19.99/mo) | 100 tasks/day | 5x free limits |
| OpenAI Codex CLI | ChatGPT Plus ($20/mo) | 30-150 msg/5h | GPT-5.1-Codex-Max |
| Amazon Q Developer | Pro ($19/mo) | Increased agentic limits | AWS-hosted Claude (EOL Apr 2027) |
| Kilo Code | Pay-as-you-go | Up to $25 signup credits | No markup on models |
Running open-weight frontier models locally provides unlimited coding assistance without API costs.
Popular Tools:
- Cline - VS Code extension with Plan/Act modes and MCP support
- Aider - Command-line assistant with Git integration
- Continue.dev - Open-source VS Code extension (200+ models)
Local Model Tools:
- Ollama - Run frontier models locally
- LM Studio - Easy desktop app for local LLMs (no terminal required)
Notable Local Models (2026):
- Qwen3.6-Plus-480B (71.2% SWE, ~150GB VRAM)
- Gemma 4 [verify] (Google, Apache 2.0, fully open-source)
- GLM-5.1 / GLM-5V-Turbo [verify] (Zhipu MoE-based SOTA coders)
- Devstral 2 (24B, Apache 2.0, agent-optimized)
- DeepSeek Coder V4 (lite version ~18GB)
- Codestral 2 (Mistral, 22B)
- GLM-4.9-Air (Chinese/English coding)
Note: Frontier models require substantial RAM/VRAM. See Unsloth Qwen3-Coder guide for details.
Update April 2026: Gemma 4 and GLM-5.1 families are new flagship open-source releases. Verify availability in Ollama/LM Studio before downloading.
Find the fastest free coding model in seconds. Ping 238 models across 25 providers in real-time.
npm install -g free-coding-models
free-coding-models- Parallel pings β all 238 models tested simultaneously
- Stability Score (0-100) β composite score from p95 latency, jitter, spike rate, uptime
- Smart ranking β top 3 highlighted π₯π₯π₯
- Favorites β star models with
F, persisted across sessions - Tool Integration β auto-configure OpenCode, Goose, Aider, Continue, Cline, etc.
- OpenCode Zen Models β 8 exclusive free models (Big Pickle, MiniMax M2.5 Free, MiMo V2, etc.)
# Most reliable model right now
free-coding-models --fiable
# Configure Goose with S-tier model
free-coding-models --goose --tier S
# NVIDIA top models only
free-coding-models --origin nvidia --tier S
# JSON output for scripting
free-coding-models --tier S --json | jq -r '.[0].modelId'| Flag | Launches |
|---|---|
--opencode |
π¦ OpenCode CLI |
--openclaw |
π¦ OpenClaw |
--goose |
πͺΏ Goose |
--aider |
π Aider |
--qwen |
π Qwen Code |
--continue |
|
--cline |
π§ Cline |
--gemini |
β Gemini CLI |
--rovo |
π¦ Rovo Dev CLI |
| And 8 more... |
| Tier | SWE-bench | Best For |
|---|---|---|
| S+ | β₯75% | Claude Opus 4.6 [verify], GPT-5.4 [verify] |
| S | 65-75% | Qwen3.6-Plus (71.2%), Claude Sonnet 4.6 [verify] |
| A+/A | 40β60% | Solid alternatives |
| A-/B+ | 30β40% | Smaller tasks |
| B/C | < 30% | Code completion |
All 238 models allow commercial use of generated output. You own what the models generate.
| License | Models | Commercial |
|---|---|---|
| Apache 2.0 | Qwen3/Qwen2.5 Coder, GPT-OSS 120B/20B, Devstral Small 2, Gemma 4, MiMo V2 Flash | β Unrestricted |
| MIT | GLM 4.5/4.6/4.7/5, MiniMax M2.1, Devstral 2 | β Unrestricted |
| Llama Community License | Llama 3.3 70B, Llama 4 Scout/Maverick | β Attribution required. >700M MAU β separate Meta license |
| DeepSeek License | DeepSeek V3/V3.1/V3.2, R1 | β Use restrictions on model (no military, no harm) β output is yours |
| NVIDIA Nemotron License | Nemotron Super/Ultra/Nano | β Updated Mar 2026, now near-Apache 2.0 permissive |
| MiniMax Model License | MiniMax M2, M2.5 | β Royalty-free, non-exclusive. Prohibited uses policy applies to model |
| Proprietary (API) | Claude (Rovo), Gemini (CLI), Perplexity Sonar, Mistral Large, Codestral | β You own outputs per provider ToS |
| OpenCode Zen | Big Pickle, MiMo V2 Pro/Flash/Omni Free, GPT 5 Nano, MiniMax M2.5 Free, Nemotron 3 Super Free | β Per OpenCode Zen ToS |
Key Points:
- Generated code is yours β no model claims ownership of your output
- Apache 2.0 / MIT models (Qwen, GLM, GPT-OSS, MiMo, Devstral Small) are the most permissive β no strings attached
- Llama requires "Built with Llama" attribution; >700M MAU needs a Meta license
- DeepSeek / MiniMax have use-restriction policies (no military use) that govern the model, not your generated code
- API-served models (Claude, Gemini, Perplexity) grant full output ownership under their terms of service
β οΈ Disclaimer: This is a summary, not legal advice. License terms can change. Always verify the current license on the model's official page before making legal decisions.
- Goal: Compare AI coding tools by their access to pro-grade models and free tier limits.
- What qualifies a model as "pro-grade"? Models must achieve β₯60% on SWE-bench Verified, demonstrating real-world software engineering capability. Current qualifying models: Claude Opus 4.5 (80.9% [verify]), GPT-5.1-Codex-Max (77.9% [verify]), Claude Sonnet 4.5 (77.2% [verify]), Gemini 3 Pro (76.2% [verify]), GPT-5 (74.9% [verify]), Claude Opus 4.1 (74.5% [verify]), Claude Sonnet 4 (72.7% [verify]), GPT-5 mini (71.0% [verify]), Qwen3-Coder-480B (69.6% [verify]), and Gemini 2.5 Pro (63.2% [verify]).
[verify]tag: Indicates information needs verification from official sources. Pricing, limits, and model availability change frequently.- Different limit types: Tools use various quota systems - requests, tokens, credits, chats - making direct comparison challenging. Check documentation for specifics.
- Real-world usage: Actual consumption varies dramatically based on coding style, task complexity, and tool implementation.
| Program | What You Get | Requirements |
|---|---|---|
| GitHub Student Pack | Free Copilot Pro for students | Verify with .edu email |
| GitHub Copilot Free | 50 chat + 2,000 completions/month | VS Code users |
| Copilot Pro for Teachers/Maintainers | Free Copilot Pro | Open source maintainers & educators |
Visual orchestration tools for building autonomous AI agents without coding.
| Platform | Free Tier | Best For | Key Features |
|---|---|---|---|
| Make (Integromat) | 1,000 ops/month | Visual builders | Drag-and-drop AI Agents, 3,000+ app integrations |
| n8n | Unlimited (self-hosted) | Technical teams | Self-hosted RAG systems, private data automation |
| Gumloop | 2,000 credits/month | No-code agents | Natural-language builder, "Gummie" troubleshooting agent |
| Relay.app | Generous free plan | Beginners | Simple agentic workflows |
| Activepieces | 1,000 tasks/month | Open-source | Flat pricing, self-hostable |
| Podium | Entry-level tiers | Sales/communication | 24/7 lead response AI agents |
| QuantFlow Pilot | Free | Autonomous development | #1 Terminal Benchmark 2.0 β AI that ships your tickets |
AI-powered tools for conversational data analysis and narrative visualization.
| Tool | Function | Free Tier Detail | Key Feature |
|---|---|---|---|
| Julius | Chat-with-data | Upload spreadsheets, generate instant visualizations | |
| Anomaly AI | AI Dashboards | Generate interactive dashboards from natural language | |
| Flourish | Data Storytelling | No-code interactive maps, "scrollytelling" features | |
| Datawrapper | Publishing | Publish-ready charts in seconds, journalism-focused | |
| Looker Studio | Marketing Data | Seamless Google Analytics/Ads integration | |
| Power BI Desktop | Microsoft reports | Copilot recommendations, local report building | |
| AI for Database | Natural language DB queries | Freemium - free tier available | Connect any DB (PostgreSQL, MySQL, MongoDB) and query in plain English β no SQL needed, with self-refreshing dashboards and workflow automation |
Professional-grade content creation with generous free tiers.
| Tool | Output | Free Tier | Key Capability |
|---|---|---|---|
| Veo | Video | Basic Free | Cinematic clips with realistic motion and sound |
| Sora 2 (via ChatGPT) | Video | Limited free tier | Deep ChatGPT integration, high-quality video |
| DALL-E 4 (via ChatGPT) | Image | Limited free tier | Latest OpenAI image model |
| Synthesia | Video Avatars | Free individual plan | "Video Agents" in 120+ languages |
| 1 More Shot | Music Videos | Free plan | Advanced lip-sync, frame-by-frame control |
| Leonardo.Ai | Images | 150 tokens/day (~70 images) | Commercial use allowed |
| Recraft AI | Vector/SVG | 30 credits/day | Infinitely scalable icons and logos |
| Ideogram | Images | 10-20 prompts/day | Perfect text rendering, "Magic Prompt" |
| Suno AI | Music | 50 credits/day (~10 tracks) | Complete songs with vocals and instruments |
| ElevenLabs | Voice | Basic Free | Realistic voice cloning |
| Canva AI | Design | Robust free tier | AI design assets, brochures, short videos |
| Tool | Function | Free Tier Detail | Key Feature |
|---|---|---|---|
| Grammarly | Writing | 100 AI prompts/month | Rewrites and tone detection |
| LanguageTool | Grammar | 10,000 characters/text | 25+ languages, open-source |
| Fathom | Meetings | Forever Free | Records/transcribes Zoom/Teams, auto-sync to CRM |
| NotebookLM | Research | Free | Audio Overview podcasts, grounded in your documents |
| Humata | PDF Analysis | 60 pages/month | Clickable source citations |
| QuillBot | Rewriting | 125 words/time | Fluency & Standard modes |
| DeepL | Translation | Basic Free | Incognito sensitive mode |
| MemoryPalace | AI Memory | Free, open source | 96.6% LongMemEval β memory palace technique for AI |
Medical AI:
| Tool | Pricing | Key Value |
|---|---|---|
| iatroX | Free | Adaptive Q-Bank, NICE/BNF clinical reference |
| DxGPT | Free | Diagnostic assistant (500K+ users, 6K doctors) |
| OpenEvidence | Free (US verified) | Evidence-grounded search, ambient note generation |
Legal AI:
| Tool | Pricing | Key Value |
|---|---|---|
| DocLegal.Ai | $10/month | Clause suggestion, risk detection |
| Doculex.ai | Varies | Case-data-driven drafting from medical records |
| Spellbook | 7-day trial | In-editor contract analysis |
| Harvey AI | Enterprise | Regulatory matters, high security |
| Tool | Function |
|---|---|
| Wellows | AI Visibility Score tracking across ChatGPT, Gemini, Perplexity |
| Google SGE Labs | See how AI Overviews interpret target keywords |
| NeuronWriter | AI content scoring |
| Surfer SEO | Content optimization |
| Jasper | AI copywriting with brand voice |
| Writesonic | Scalable copywriting |
| Tool | Function | Description |
|---|---|---|
| Open WebUI | Local Chat Interface | ChatGPT-like experience running entirely offline with Ollama |
| Whisper (OpenAI) | Speech-to-Text | Most accurate open-source transcription |
| Piper | Text-to-Speech | High-quality offline audio generation |
| ComfyUI | Image Generation | Node-based interface for Stable Diffusion |
| Zed | AI IDE | 50 AI prompts/month, native performance, high speed |
| Void IDE | Agent-first IDE | Multi-agent frontend/backend/testing |
| MemoryPalace | AI Memory System | 96.6% LongMemEval β memory palace technique for AI conversations |
Low-latency APIs for voice assistants, live coding copilots, trading tools, and realtime chat.
| Provider | Latency | Best For | Free Tier |
|---|---|---|---|
| Groq Streaming | ~50-150ms (0.4ms/token) | Live coding, chat | 14.4K req/day |
| OpenAI Realtime API | Low | Voice assistants, agents | No free tier (pay-per-use only, trial credits new accounts) |
| Gemini Live API | Low | Multimodal streaming | Dynamic caps (varies by prompt complexity) |
| Cerebras | 2,400 tok/sec (Qwen3.6) | Batch + streaming | 1.5M tokens/day |
| Cloudflare Workers AI | Edge | Global low-latency | 10K neurons/day |
| Provider | Type | Latency | Free Tier |
|---|---|---|---|
| Deepgram | STT streaming | ~300ms | $200 credits |
| AssemblyAI Streaming | Realtime STT | ~400ms | 50 hours/month |
| Groq Whisper | STT fast | ~200ms | 2,000 req/day |
| ElevenLabs Streaming | TTS streaming | ~100ms | 10K chars/month |
| OpenAI Realtime | STT + LLM + TTS | ~200ms | Limited |
Best for:
- Trading bots: Groq streaming (fastest)
- Voice assistants: OpenAI Realtime API (end-to-end)
- Live captions: AssemblyAI or Deepgram
- Realtime chat: Gemini Live API
Speech-to-text and text-to-speech models comparison.
| Model | Provider | Accuracy | Speed | Free Tier | Best For |
|---|---|---|---|---|---|
| Whisper Large v3 | OpenAI/Groq/Local | Excellent | Fast | 2,000 req/day (Groq) | General purpose, local |
| Deepgram Nova | Deepgram | Superior | Very Fast | $200 credits | Production, enterprise |
| AssemblyAI | AssemblyAI | Excellent | Fast | 50 hours/month | Streaming, diarization |
| Whisper API | OpenAI | Excellent | Medium | Pay-per-use | Reliable, consistent |
| Google Speech | Google Cloud | Good | Fast | 60 min/month | Google ecosystem |
| Whisper (local) | OpenAI/Ollama | Excellent | GPU-dependent | Unlimited offline | Privacy, cost control |
| Model | Provider | Quality | Speed | Free Tier | Best For |
|---|---|---|---|---|---|
| ElevenLabs | ElevenLabs | π Best | Fast | 10K chars/month | Voice cloning, pro voice |
| OpenAI TTS | OpenAI | Excellent | Fast | Pay-per-use | Reliable, cheap |
| Piper | Local | Good | Very Fast | Unlimited offline | Privacy, self-hosted |
| Bark | Suno/Local | Good | Medium | Free (local) | Expressive, local |
| Google TTS | Google Cloud | Good | Fast | 1M chars/month | Google ecosystem |
| WhisperSpeech | Local | Good | Fast | Unlimited | Whisper-based TTS |
| API | Input | Output | Latency | Use Case |
|---|---|---|---|---|
| OpenAI Realtime | Audio | Audio | ~200ms | Voice agents |
| Deepgram Voice | Audio | Text/Audio | ~300ms | Voice bots |
| AssemblyAI LeMUR | Audio | LLM response | ~1s | Voice RAG |
Comparison of image generation models and APIs.
| Model | Provider | Quality | Speed | Free Tier | Best For |
|---|---|---|---|---|---|
| FLUX.2 | Black Forest Labs | π Excellent | Fast | Local/Replicate | High quality, open |
| DALL-E 4 | OpenAI | π Best | Medium | ChatGPT Plus | Latest OpenAI |
| Ideogram 2.0 | Ideogram | Excellent | Fast | 20 prompts/day | Text in images |
| Recraft V4 | Recraft | Excellent | Fast | 50 credits/day | Vector/SVG output |
| Stable Diffusion XL | Stability AI | Good | Fast | Local/DreamStudio | Flexibility, local |
| Midjourney v6 | Midjourney | π Excellent | Slow | None (paid only) | Artistic, Discord |
| Leonardo.ai | Leonardo | Very Good | Fast | 150 tokens/day | Commercial use, gaming |
| Adobe Firefly | Adobe | Good | Fast | 25 credits/month | Safe, commercial |
| Imagen 3 | Excellent | Medium | Vertex AI trial | Photorealistic | |
| DiffusionBee | Local | Good | Fast | Local unlimited | Easy setup, open-source |
| ComfyUI | Local | Good | Fast | Local unlimited | Advanced, node-based |
| Provider | Model | Free Tier | Notes |
|---|---|---|---|
| Replicate | FLUX.1-schnell | Free tier | Fast inference |
| Pollinations | Various | Unlimited | No signup |
| HuggingFace | SDXL/FLUX | $0.10 credits | Inference API |
| Leonardo | Phoenix | 150 tokens/day | Commercial OK |
Text-to-video and image-to-video generation. Hot area in 2026.
| Model | Provider | Quality | Duration | Free Tier | Best For |
|---|---|---|---|---|---|
| Veo 3 | π Excellent | 1080p, 60s clips | Limited preview | Cinematic, realistic | |
| Sora 3 | OpenAI | π Excellent | 120s | ChatGPT Plus | High quality, physics |
| Runway Gen-3 | Runway | Excellent | 10 seconds | 3 free credits | Creative, filmmaking |
| Pika 3.0 | Pika | Very Good | 3-5 seconds | Free tier | Lip-sync improved |
| Luma Dream Machine | Luma | Very Good | 5 seconds | 30 generations/mo | Fast, realistic |
| Kling | Kuaishou | Excellent | 2-10 minutes | Limited | Long-form, Chinese |
| Hailuo AI | MiniMax | Good | 6 seconds | Free tier | Character consistency |
| Stable Video Diffusion | Stability | Good | 4 seconds | Local | Open, flexible |
| Provider | Cost per video | Generation time |
|---|---|---|
| Runway | ~$0.20-0.50 | 1-5 min |
| Pika | ~$0.10-0.30 | 30s-2 min |
| Luma | ~$0.30-0.60 | 2-5 min |
| Kling | ~$0.05-0.20 | 1-10 min |
Tools for AI agents to control browsers - web scraping, form filling, testing.
| Tool | Type | Pricing | Best For |
|---|---|---|---|
| Browserbase | Managed browsers | $5 free tier | Production agents |
| Steel.dev | Browser API | Free tier | AI-native browser control |
| Stagehand | AI browser framework | Open source | Next-gen Playwright |
| Playwright | Browser automation | Free | Reliable, well-documented |
| Puppeteer | Chrome automation | Free | Chrome-specific |
| Selenium | Cross-browser | Free | Legacy support |
| Scrapy | Web scraping | Free | Data extraction |
| Tool | AI Integration | Use Case |
|---|---|---|
| Stagehand | Natural language commands | AI agents controlling browsers |
| Browserbase | Session recording for AI | Training agent trajectories |
| Steel.dev | Built for LLM agents | Agent-native browser API |
Stack Recommendation:
- AI agents: Stagehand + Browserbase
- Web scraping: Playwright + Scrapy
- Testing: Playwright + AI assertions
Production-ready vector storage without high costs.
| Provider | Type | Free Tier | Paid | Best For |
|---|---|---|---|---|
| Supabase Vector | Postgres + pgvector | 500MB | $25/mo starter | Full-stack apps |
| Neon | Serverless Postgres | 500MB | $19/mo | Serverless, branching |
| Railway | Managed Postgres | $5 credits | Usage-based | Easy deployment |
| PlanetScale | MySQL + vectors | 5GB | $39/mo | Scale, branching |
| Chroma Cloud | Vector-native | Free tier | Usage-based | Pure vector workloads |
| Qdrant Cloud | Vector DB | 1GB | $25/mo | High performance |
| Pinecone | Managed vector | 2GB | $70/mo | Production, no ops |
| Weaviate Cloud | Vector DB | 5M vectors | $25/mo | Hybrid search |
| LanceDB | Embedded/Cloud | Free | Cloud beta | Multimodal |
| Database | Best For | Notes |
|---|---|---|
| ChromaDB | Prototyping | Simple, Python-native |
| Qdrant | Production | Rust-based, fast |
| Milvus | Enterprise | Scalable, complex |
| pgvector | Postgres apps | Just add extension |
| LanceDB | Embedded | No server needed |
Recommendation by Stage:
- MVP: ChromaDB (local) β Supabase (hosted)
- Production: Qdrant Cloud or Pinecone
- Enterprise: Milvus or Weaviate
Proven patterns for building AI applications.
User β Chat UI β LLM API β Response
β
Context Memory (Redis/Postgres)
Stack:
- Frontend: Next.js + Vercel AI SDK
- Backend: FastAPI + OpenRouter
- Memory: Upstash Redis or Supabase
Documents β Chunking β Embeddings β Vector DB
β
User Query β Embedding β Similarity Search β LLM β Response
Stack:
- Framework: LlamaIndex or LangChain
- Embeddings: BGE-Large or Jina v3
- Vector DB: ChromaDB (dev) β Pinecone (prod)
- LLM: Claude Sonnet [verify] or GPT-4o
User Request β Agent Controller β Tool 1 (Search)
β Tool 2 (Code exec)
β Tool 3 (API call)
β
Synthesize β Response
Stack:
- Framework: LangGraph, AutoGen, or CrewAI
- Tools: Function calling with Claude/GPT-4
- Memory: Vector DB + State management
- Monitoring: LangSmith or Arize
User Request β Router (classify intent)
β
βββββββββββββββββΌββββββββββββββββ
β β β
Cheap Model Medium Model Expensive Model
(GPT-5 Nano) (Claude Sonnet [verify]) (Claude Opus [verify])
β β β
Simple Q&A Complex task Hard reasoning
Implementation:
- Router: Fine-tuned classifier or LLM-based
- Cost optimization: Route 80% to cheap models
- Fallback: Escalate if cheap model fails
Audio Input β STT β LLM β TTS β Audio Output
β β β β
Deepgram Groq Claude ElevenLabs
Stack:
- STT: Deepgram or Whisper Streaming
- LLM: Groq for speed or OpenAI Realtime
- TTS: ElevenLabs or OpenAI TTS
- Latency target: <500ms end-to-end
Image Input β Vision LLM β Structured Output
β
Database / Action
Stack:
- Vision: GPT-4o Vision or Gemini 2.5 Pro
- Structured output: Instructor + Pydantic
- Storage: Postgres JSONB or MongoDB
Text Prompt β LLM Enhancement β Image Gen β Upscaling
β
Video Gen (optional)
Stack:
- Enhancement: GPT-4 or Claude
- Image: FLUX or DALL-E 3
- Upscale: Upscayl or Magnific
- Video: Runway or Pika
API pricing for budget planning. Sorted by input cost.
| Model | Provider | Input | Output | Cache Hit | Best For |
|---|---|---|---|---|---|
| MiniMax M2.6 | MiniMax | $0.08 | $0.12 | - | Bulk generation |
| DeepSeek V4 | DeepSeek | $0.28 | $0.55 | $0.03 π― | Coding, cached |
| GLM 4.9 Air | ZAI | $0.35 | $0.75 | - | Chinese/English |
| Gemini 3.1 Flash | $0.30 | $0.90 | - | 2M context | |
| GPT-5 Nano | OpenAI | $0.45 | $1.80 | - | Cheap reasoning |
| Qwen3-Coder | Alibaba | ~$0.60 | ~$1.20 | - | Strong agent tasks |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.625 | High quality, 1M context | |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | - | General purpose |
| GPT-5.4 | OpenAI | $2.50 | $10.00 | $1.25 | Latest OpenAI model |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | $0.30 | Best coding, reasoning |
| Claude Sonnet 4.5 | Anthropic | $3.00 | $15.00 | $0.30 | Coding, agent workflows |
| Claude Opus 4.6 / 4.7 / 4.8 | Anthropic | $5.00 | $25.00 | $0.50 | Complex reasoning |
| Claude Fable 5 / Mythos 5 | Anthropic | $10.00 | $50.00 | $1.00 | Limited availability (Glasswing) |
| MiMo V2.5 Pro | Xiaomi | $0.435 | $0.87 | $0.0036 π― | Long-horizon agents, 1K+ tool calls |
π‘ Pro tip: DeepSeek's 90% cache discount makes it cheapest for repetitive tasks with long prompts.
β οΈ Anthropic lineup note: Claude Haiku 4.5 ($1/$5) was added in 2026 for high-frequency lightweight tasks. Claude Sonnet 4 (deprecated) retains $3/$15. Regional/multi-region Bedrock endpoints carry a 10% premium. Opus 4.7+ uses a new tokenizer that can use up to 35% more tokens for the same text.
Don't just use SWE-bench - match models to your specific task.
| Model | Why | Free Tier |
|---|---|---|
| Claude Sonnet 4.6 | 79.3% SWE-bench, excellent at following instructions | 25 msgs/5h (Claude Code) |
| Qwen3.6-Plus | 71.2% SWE-bench, Chinese + English, agent-optimized | 2,000 req/day |
| GPT-5.4 [verify: paid-only] | 80.1% SWE-bench, long context compaction | ChatGPT Plus/Pro |
| DeepSeek V4 | Near-Sonnet performance at 1/10th cost | DeepSeek API |
| Model | Why | Free Tier |
|---|---|---|
| DeepSeek R1 | Specialized reasoning model, math/logic | DeepSeek API |
| MiMo V2.5 Pro | Long-horizon agents (1K+ tool calls), 34x cheaper than GPT-5.5 | Xiaomi Token Plan ($6-$100/mo) |
| Claude Opus 4.6 / 4.7 / 4.8 | 84.2% SWE-bench (4.6), best for complex architecture | Claude Code Pro |
| Gemini 3.1 Pro | 77.4% SWE-bench, 2M context for deep analysis | 100 req/day |
| o3-mini / o1 | OpenAI reasoning models, step-by-step | ChatGPT Plus |
| Claude Fable 5 / Mythos 5 | Anthropic Glasswing (limited availability), top tier | API only |
| Model | Why | Cost per 1M |
|---|---|---|
| Gemini 2.5 Flash | 1M context, high throughput | ~$0.35/$1.00 |
| GPT-5 Nano | Newest cheap model from OpenAI | $0.50/$2.00 |
| GPT-4o | ChatGPT free tier model, fast | Variable (free tier) |
| GLM 4.5 Air | Good quality, extremely cheap | ~$0.40/$0.80 |
| MiniMax M2.7 | 80.2% SWE-bench, dirt cheap | $0.08/$0.12 |
| Model | Why | Free Tier |
|---|---|---|
| Claude Sonnet 4.6 | Best tool use, reliable agent behavior | Various |
| GPT-5.4 [verify: paid-only] | Compaction for 24+ hour sessions | ChatGPT Plus/Pro |
| Qwen3.6-Plus | Built for agentic workflows | 2,000 req/day |
| Big Pickle (OpenCode) | 72% SWE-bench [verify], agent-optimized | Zen Free tier |
| Model | Why | Free Tier |
|---|---|---|
| Gemini 2.5 Pro Vision | 1M token context for images/video | 20-100 req/day |
| GPT-4o | Best overall vision capabilities | ChatGPT Free |
| Claude 4 Vision | Detailed image analysis | Claude Free tier |
| Qwen2.5 VL | Strong open vision model | Hyperbolic |
| Model | Provider | Free Tier |
|---|---|---|
| Whisper Large v3 | Groq / Local | 2,000 req/day or unlimited local |
| ElevenLabs | ElevenLabs | Basic free tier |
| Piper | Local | Free, offline TTS |
Critical for scaling applications. Plan your architecture.
| Provider | RPM | TPM | Daily | Best For |
|---|---|---|---|---|
| Groq | 30 | Medium | 14,400 | High-throughput apps |
| Cerebras | 30 | 1,000,000 | 14,400 | Batch processing |
| Gemini Studio | 15 | High | 1,500 | Prototyping |
| OpenRouter | 20 | Medium | 50-1,000 | Flexible routing |
| Cloudflare | 300 | 10K neurons | 10K neurons | Edge deployment |
| Groq (varies) | 30-50 | 6K-30K | 1K-14.4K | Model-dependent |
| App Type | Recommended Stack |
|---|---|
| ExamAi (your app) | Cerebras (Qwen3.6-Plus) + Groq |
| AI Reel Generator | Gemini 3.1 Flash (video) + Groq (audio) |
| Trading AI | Groq + local Qwen3.6-Plus |
| Chatbot | OpenRouter + Gemini 3.1 Flash (cheap) |
| Code Review Bot | DeepSeek V4 (cheap) + Claude Sonnet [verify] (quality) |
Quick reference for legal safety.
| Provider | Commercial Use | Notes |
|---|---|---|
| OpenRouter | β Yes | All models |
| Groq | β Yes | All models |
| Gemini API | β Yes | Per Google ToS |
| Cohere | β Yes | 1K req/month free |
| Claude (API) | β Yes | Per Anthropic ToS |
| OpenCode Zen | β Yes | Per Zen ToS |
| DeepSeek | β Yes | No military use restriction |
| Qwen/Alibaba | β Yes | Apache 2.0 models |
| Ollama Local | β Yes | Fully offline |
β οΈ Always verify current ToS - licenses can change.
Build document Q&A systems like ExamAi.
| Tool | Best For | Free Tier |
|---|---|---|
| LlamaIndex | Production RAG | Open source |
| LangChain | Flexibility | Open source |
| Haystack | Enterprise | Open source |
| Vercel AI SDK | Edge RAG | Free tier |
| Database | Type | Free Tier | Best For |
|---|---|---|---|
| ChromaDB | Local | Unlimited | Prototyping, small apps |
| LanceDB | Local/Serverless | Generous | Multimodal, embeddings |
| Weaviate | Cloud/Local | 5M vectors | Production scale |
| Supabase Vector | Postgres | 500MB | Full-stack apps |
| Pinecone | Managed | 2GB (1 pod) | Production, no ops |
| Qdrant | Local/Cloud | 1GB cloud | High performance |
| Tool | Purpose |
|---|---|
| RAGAS | Evaluate retrieval quality |
| LlamaIndex Evals | Built-in RAG metrics |
| Arize Phoenix | Observability |
Essential for RAG - don't overlook these.
| Embedding | Provider | Dimensions | Free Tier | Best For |
|---|---|---|---|---|
| text-embedding-3-small | OpenAI | 1536 | 200K tokens/day | General purpose |
| Jina Embeddings v3 | Jina AI | 1024 | 1M tokens/day | Multilingual |
| BGE-Large-EN-v1.5 | HuggingFace/Local | 1024 | Free | High quality retrieval |
| E5-Mistral-7B | Various | 4096 | Varies | Best accuracy |
| Nomic Embed v1.5 | Nomic | 768 | Free tier | Long context (8K) |
| GTE-Large | Alibaba | 1024 | DashScope free | Chinese + English |
| Model | Size | Speed | Quality |
|---|---|---|---|
| BGE-Small | 33M | Fast | Good |
| MiniLM-L6 | 22M | Very Fast | Basic |
| Nomic Embed | 137M | Fast | Excellent |
Scale beyond free tiers.
| Provider | Type | Pricing | Best For |
|---|---|---|---|
| Modal | Serverless GPU | $5-30/month credits | Batch inference |
| RunPod | GPU Cloud | $0.20-0.50/hr | Training, fine-tuning |
| Vast.ai | Spot GPUs | Cheap spot prices | Budget inference |
| Lambda Labs | GPU Cloud | ~$0.60/hr A100 | Stable workloads |
| Beam.cloud | Serverless | Per request | Spiky traffic |
| Baseten | Model serving | $30 credits | Production models |
| Replicate | Model hosting | 6 req/min free | Quick deployment |
| Platform | Cold Start | Best For |
|---|---|---|
| Modal | Fast | Python functions |
| Beam | Fast | ML models |
| Replicate | Medium | Pre-built models |
| HuggingFace Inference | Medium | HF ecosystem |
Benchmark your models before production.
| Tool | Purpose | Free Tier |
|---|---|---|
| Promptfoo | Prompt testing, red-teaming | Open source |
| LangSmith | Tracing, evals | 5K traces/month |
| RAGAS | RAG evaluation | Open source |
| DeepEval | LLM unit testing | Open source |
| Arize Phoenix | Observability | Generous free tier |
| Weights & Biases | Experiment tracking | Academic free |
Force LLMs to return valid JSON/schemas.
| Tool | Approach | Best For |
|---|---|---|
| Instructor | Pydantic validation | Python apps |
| Guidance | Constrained generation | Complex schemas |
| Outlines | Regex/constrained | Fast inference |
| JSONformer | Structure-aware decoding | Local models |
| Zod + Vercel AI SDK | TypeScript validation | Web apps |
Quick reference for badges used in this guide.
| Badge | Meaning |
|---|---|
| π’ | No credit card required |
| π³ | Credit card required |
| β‘ | Fast inference (low latency) |
| π§ | Strong reasoning capabilities |
| π» | Coding optimized |
| π¦ | Open source / self-hostable |
| π | Privacy focused / local |
| π€ | Agentic capabilities |
| π― | Best value / cheap |
| π | Multilingual support |
[verify] |
Needs verification from official source |
If you spot an error, missing source link, or have updated quota/model information, please open an issue or pull request with a source.
No affiliation with any vendor. All trademarks belong to their owners. Information is for research; accuracy not guaranteed; limits/pricing change frequently.
- cheahjs/free-llm-api-resources (18.4k β) - Comprehensive free LLM API list
- mnfst/awesome-free-llm-apis (2.1k β) - Permanent free LLM API tiers
- inmve/free-ai-coding (648 β) - Pro-grade AI coding tools comparison
- Coding with AI - Practical techniques for coding with LLMs
- nowork-studio/awesome-ai-startups - A curated list of bootstrapped, pre-seed, and angel-funded AI products built by independent founders
This list was compiled and verified using:
- Gemini - For research and discovering new/additional AI tools
- Perplexity - For verifying information accuracy and checking if data is current
- Community repos - All referenced repositories above were used as reference sources
MIT Β© ShaikhWarsi
Last updated: June 16, 2026 β’ PRs/issues welcome