The Problem: Claude Max Has No API Key
You subscribe to Claude Max. You use Claude Code daily. It is fantastic. But the moment you try to plug Claude into an agent framework, an IDE plugin, or any tool that speaks the OpenAI API, you hit a wall: Claude Max has no raw API key.
Anthropic's API is a separate billing path. You pay per token there. Your Claude Max subscription, which gives you unlimited usage inside Claude Code, does not expose a bearer token you can paste into Continue.dev, Cursor, Aider, or your own scripts.
That is the exact gap claude-proxy fills.
claude-proxy is a ~300-line, zero-dependency Python HTTP server that wraps your local
claudeCLI and exposes a standard OpenAI-compatible/v1/chat/completionsendpoint. Any tool that speaks OpenAI's API can now use your Claude Code session as the backend.
How It Works
The architecture is brutally simple:
OpenAI-API client ---> /v1/chat/completions ---> claude_openai_proxy.py ---> claude CLI ---> Anthropic
The proxy translates between two worlds. On the left, OpenAI's request format (messages array, model, stream, tools). On the right, the claude CLI's stdin/stdout protocol (transcript piped via -p mode with --output-format json).
The mapping is straightforward:
| OpenAI concept | Claude CLI mapping |
|---|---|
system messages | --append-system-prompt |
| Conversation history | Flattened transcript piped to stdin |
stream: true | --output-format stream-json --include-partial-messages |
| Token counts | Parsed from CLI's usage block |
| Function calling | Prompt-based emulation (see below) |
3 Commands to Run
1. Install the CLI. Make sure claude is installed and you have logged in at least once:
claude
# Complete the interactive auth flow. Then exit.
2. Start the proxy.
python3 claude_openai_proxy.py --port 8088
# -> http://127.0.0.1:8088/v1 (api key is ignored)
3. Point any client at it.
# OpenAI Python SDK
export OPENAI_BASE_URL=http://localhost:8088/v1
export OPENAI_API_KEY=any-non-empty-string
# Cursor / Continue.dev / Aider / LiteLLM
LLM_BASE_URL=http://localhost:8088/v1
LLM_API_KEY=ignored
LLM_MODEL=sonnet
That is it. No pip install. No Docker required (though a Dockerfile is included). No dependencies beyond Python 3 standard library.
Supported Models
The proxy exposes these model IDs, all passed directly to claude --model:
| Model ID | Backend |
|---|---|
sonnet (default) | Claude Sonnet 4.6 |
opus | Claude Opus 4.8 |
haiku | Claude Haiku 4.5 |
claude-sonnet-4-6 | Full ID alias |
claude-opus-4-8 | Full ID alias |
claude-haiku-4-5-20251001 | Full ID alias |
Set CLAUDE_PROXY_DEFAULT_MODEL=opus if you want Opus when the client omits the model field.
Tool / Function Calling: Prompt-Based Emulation
This is the clever part. The claude CLI, in -p mode, does not have native OpenAI-style tool calling. The proxy emulates it with a prompt protocol:
- When tools are provided in the request, they are described in an injected system instruction that tells the model to emit a fenced
tool_call {json}block. - Those blocks are parsed back into standard OpenAI
choices[0].message.tool_calls[]withfinish_reason: "tool_calls". - On the next turn, incoming
role: "tool"results are rendered into the transcript so the model can produce a final answer.
False-refusal hardening. The
claudeCLI is itself Claude Code, with its own native tool registry. It occasionally treats a prompt-injected tool as "not available" and apologizes instead of calling it. The proxy guards against this with a retry mechanism: if tools were offered but the model refused without calling anything, the proxy retries once with a hard corrective nudge.
Concurrency Control
Each request spawns a claude CLI process that consumes 200-400 MB of RAM. On small hosts, too many concurrent requests cause swap thrashing or OOM kills. The proxy uses a bounded semaphore to cap concurrent CLI processes:
export CLAUDE_PROXY_MAX_CONCURRENCY=2 # safe for a ~2 GB box
export CLAUDE_PROXY_MAX_CONCURRENCY=0 # unlimited (use with caution)
Host Integration: Evonic Example
The repo includes a worked example for integrating with Evonic, an AI agent platform. Host-specific tool-call handlers live in modules/ and are auto-discovered (subclass BaseToolCallHandler, call register(), select via X-Toolcall-Module header).
This means you can wire a full agentic application where Evonic manages the orchestration and the proxy routes tool calls to your custom handlers. The integration guide covers the full setup: disabling native CLI tools, prompt-emulated tool calls, false-refusal hardening, and end-to-end wiring.
Running 24/7
systemd
sudo cp claude-proxy.service /etc/systemd/system/claude-proxy@.service
sudo systemctl daemon-reload
sudo systemctl enable --now claude-proxy@youruser
Docker
docker compose up -d --build
# or
docker build -t claude-proxy .
docker run --rm -p 8088:8088 -v "$HOME/.claude:/root/.claude" claude-proxy
The claude CLI needs your Claude Code credentials (~/.claude), so mount them. Never bake them into the image.
Limitations and Trade-offs
Important: Using a Claude Max subscription as a generic API backend for a third-party app may violate Anthropic's Terms of Service. This is unofficial, has no SLA, and may break on CLI updates. For production, a real Anthropic API key or OpenRouter is the supported path.
Other trade-offs to be aware of:
- One CLI process per request. Stateless. Multi-turn context is replayed each call, which means long conversations incur token reprocessing overhead.
- Tool calling is prompt-based, not native. Very large or complex tool schemas may need tuning. The protocol works well for normal agentic loops but is not as robust as Anthropic's native tool-use API.
- Latency. Each request launches a fresh
claudeprocess. Expect a few hundred milliseconds of cold-start overhead compared to a persistent API connection. - No native vision. Image inputs go through the CLI's transcript mode, which has limits.
Who Is This For?
- Claude Max subscribers who want to use their subscription with agent frameworks (LangChain, CrewAI, AutoGen) without paying twice.
- IDE users who want Claude inside Continue.dev or Cursor's OpenAI-compatible provider.
- Developers building internal tools on a budget who already pay for Claude Max and want to reuse that capacity.
- Evonic users who need a drop-in Claude backend with custom tool handlers.
Comparison: claude-proxy vs Alternatives
| Option | Requires API key? | Cost model | Tool calling |
|---|---|---|---|
| claude-proxy | No (uses CLI session) | Claude Max subscription | Prompt-based emulation |
| Anthropic API direct | Yes | Per-token | Native |
| OpenRouter | Yes | Per-token + margin | Native |
| LiteLLM proxy | Yes | Pass-through | Varies by provider |
Get Started
Clone the repo and run it. It takes less than 60 seconds:
git clone https://github.com/rephapeng/claude-proxy.git
cd claude-proxy
python3 claude_openai_proxy.py --port 8088
Then point any OpenAI-compatible client at http://localhost:8088/v1. No API key needed. Paste anything non-empty. The proxy ignores it.
Star the repo if it saves you from paying for tokens twice. Contributions welcome.
Repository: github.com/rephapeng/claude-proxy License: MIT Language: Python 3 (stdlib only, zero dependencies) Lines of code: ~300 lines core proxy + extensible module system