If Claude Code's pricing—$20 to $200 a month, with rate limits that reset every five hours—has you reconsidering your AI coding setup, there's a credible free alternative worth testing. Goose, an open-source agent from Block (Jack Dorsey's payments company), does much of what Claude Code does: it writes, runs, debugs, and orchestrates code across files. The difference is that it can run entirely on your own hardware, with the model of your choice, at zero cost.

The practical hook is model agnosticism. Goose connects to Anthropic's Claude, OpenAI's GPT-5, Google's Gemini, or routing services like Groq and OpenRouter—but it can also run fully local using Ollama and open-source models. Local means no subscription, no usage caps, no rate limits, and no code leaving your machine. You can work offline, including on a plane. The project has momentum to match: 26,100+ GitHub stars, 362 contributors, and 102 releases, with version 1.20.1 shipping in January 2026.

Why does this matter now? Anthropic's rate limits have frustrated heavy users. The Pro tier ($17–20/month) caps you at roughly 10–40 prompts per five hours; even the $200 Max plan imposes weekly limits framed in vague "hours" that actually translate to token budgets—independent analysis pegs per-session limits near 44,000 tokens for Pro and 220,000 for Max. Anthropic says fewer than 5% of users are affected, but it hasn't clarified whether that's 5% of Max subscribers or all users. Either way, serious developers report hitting walls within 30 minutes of intensive work.

Getting started with a fully free setup takes three pieces. Install Ollama from ollama.com and pull a model with good tool-calling support—ollama run qwen2.5 works well for coding. Install Goose (desktop or CLI) from its GitHub releases; Block ships binaries for macOS, Windows, and Linux. Then point Goose at Ollama: in the desktop app go to Settings → Configure Provider → Ollama and confirm the host is http://localhost:11434; in the CLI, run goose configure, select Ollama, and enter the model name. That's the whole loop—an autonomous agent running locally with no fees.

Budget your hardware honestly. Block recommends 32GB of RAM as a solid baseline for larger models; on Macs that's unified memory, on Windows/Linux a discrete NVIDIA GPU's VRAM does the heavy lifting. You don't need a workstation to start—smaller Qwen 2.5 variants run on 16GB—but an 8GB MacBook Air will struggle. Start small to validate your workflow, then scale up.

Know the trade-offs before you commit. Claude 4.5 Opus still leads on the hardest tasks, follows nuanced instructions better, and tops the Berkeley Function-Calling Leaderboard for tool use. Claude's API context window (up to one million tokens) dwarfs the typical local default of 4K–8K. Cloud inference is faster, and Anthropic's tooling—prompt caching, structured outputs—is more polished. But the gap is closing fast: open models like Kimi K2 and GLM 4.5 now benchmark near Sonnet 4. For developers who value cost, privacy, offline access, and control over polish and peak quality, Goose is a real option you own outright.