Claude Proxy: Turn Your Claude Code CLI into an OpenAI-Compatible A...

The Problem: Claude Max Has No API Key

You subscribe to Claude Max. You use Claude Code daily. It is fantastic. But the moment you try to plug Claude into an agent framework, an IDE plugin, or any tool that speaks the OpenAI API, you hit a wall: Claude Max has no raw API key.

Anthropic's API is a separate billing path. You pay per token there. Your Claude Max subscription, which gives you unlimited usage inside Claude Code, does not expose a bearer token you can paste into Continue.dev, Cursor, Aider, or your own scripts.

That is the exact gap claude-proxy fills.

claude-proxy is a ~300-line, zero-dependency Python HTTP server that wraps your local claude CLI and exposes a standard OpenAI-compatible /v1/chat/completions endpoint. Any tool that speaks OpenAI's API can now use your Claude Code session as the backend.

How It Works

The architecture is brutally simple:

OpenAI-API client ---> /v1/chat/completions ---> claude_openai_proxy.py ---> claude CLI ---> Anthropic

The proxy translates between two worlds. On the left, OpenAI's request format (messages array, model, stream, tools). On the right, the claude CLI's stdin/stdout protocol (transcript piped via -p mode with --output-format json).

The mapping is straightforward:

OpenAI concept	Claude CLI mapping
`system` messages	`--append-system-prompt`
Conversation history	Flattened transcript piped to stdin
`stream: true`	`--output-format stream-json --include-partial-messages`
Token counts	Parsed from CLI's `usage` block
Function calling	Prompt-based emulation (see below)

3 Commands to Run

1. Install the CLI. Make sure claude is installed and you have logged in at least once:

claude
# Complete the interactive auth flow. Then exit.

2. Start the proxy.

python3 claude_openai_proxy.py --port 8088
# -> http://127.0.0.1:8088/v1   (api key is ignored)

3. Point any client at it.

# OpenAI Python SDK
export OPENAI_BASE_URL=http://localhost:8088/v1
export OPENAI_API_KEY=any-non-empty-string

# Cursor / Continue.dev / Aider / LiteLLM
LLM_BASE_URL=http://localhost:8088/v1
LLM_API_KEY=ignored
LLM_MODEL=sonnet

That is it. No pip install. No Docker required (though a Dockerfile is included). No dependencies beyond Python 3 standard library.

Supported Models

The proxy exposes these model IDs, all passed directly to claude --model:

Model ID	Backend
`sonnet` (default)	Claude Sonnet 4.6
`opus`	Claude Opus 4.8
`haiku`	Claude Haiku 4.5
`claude-sonnet-4-6`	Full ID alias
`claude-opus-4-8`	Full ID alias
`claude-haiku-4-5-20251001`	Full ID alias

Set CLAUDE_PROXY_DEFAULT_MODEL=opus if you want Opus when the client omits the model field.

Tool / Function Calling: Prompt-Based Emulation

This is the clever part. The claude CLI, in -p mode, does not have native OpenAI-style tool calling. The proxy emulates it with a prompt protocol:

When tools are provided in the request, they are described in an injected system instruction that tells the model to emit a fenced tool_call {json} block.
Those blocks are parsed back into standard OpenAI choices[0].message.tool_calls[] with finish_reason: "tool_calls".
On the next turn, incoming role: "tool" results are rendered into the transcript so the model can produce a final answer.

False-refusal hardening. The claude CLI is itself Claude Code, with its own native tool registry. It occasionally treats a prompt-injected tool as "not available" and apologizes instead of calling it. The proxy guards against this with a retry mechanism: if tools were offered but the model refused without calling anything, the proxy retries once with a hard corrective nudge.

Concurrency Control

Each request spawns a claude CLI process that consumes 200-400 MB of RAM. On small hosts, too many concurrent requests cause swap thrashing or OOM kills. The proxy uses a bounded semaphore to cap concurrent CLI processes:

export CLAUDE_PROXY_MAX_CONCURRENCY=2   # safe for a ~2 GB box
export CLAUDE_PROXY_MAX_CONCURRENCY=0   # unlimited (use with caution)

Host Integration: Evonic Example

The repo includes a worked example for integrating with Evonic, an AI agent platform. Host-specific tool-call handlers live in modules/ and are auto-discovered (subclass BaseToolCallHandler, call register(), select via X-Toolcall-Module header).

This means you can wire a full agentic application where Evonic manages the orchestration and the proxy routes tool calls to your custom handlers. The integration guide covers the full setup: disabling native CLI tools, prompt-emulated tool calls, false-refusal hardening, and end-to-end wiring.

Running 24/7

systemd

sudo cp claude-proxy.service /etc/systemd/system/claude-proxy@.service
sudo systemctl daemon-reload
sudo systemctl enable --now claude-proxy@youruser

Docker

docker compose up -d --build
# or
docker build -t claude-proxy .
docker run --rm -p 8088:8088 -v "$HOME/.claude:/root/.claude" claude-proxy

The claude CLI needs your Claude Code credentials (~/.claude), so mount them. Never bake them into the image.

Limitations and Trade-offs

Important: Using a Claude Max subscription as a generic API backend for a third-party app may violate Anthropic's Terms of Service. This is unofficial, has no SLA, and may break on CLI updates. For production, a real Anthropic API key or OpenRouter is the supported path.

Other trade-offs to be aware of:

One CLI process per request. Stateless. Multi-turn context is replayed each call, which means long conversations incur token reprocessing overhead.
Tool calling is prompt-based, not native. Very large or complex tool schemas may need tuning. The protocol works well for normal agentic loops but is not as robust as Anthropic's native tool-use API.
Latency. Each request launches a fresh claude process. Expect a few hundred milliseconds of cold-start overhead compared to a persistent API connection.
No native vision. Image inputs go through the CLI's transcript mode, which has limits.

Who Is This For?

Claude Max subscribers who want to use their subscription with agent frameworks (LangChain, CrewAI, AutoGen) without paying twice.
IDE users who want Claude inside Continue.dev or Cursor's OpenAI-compatible provider.
Developers building internal tools on a budget who already pay for Claude Max and want to reuse that capacity.
Evonic users who need a drop-in Claude backend with custom tool handlers.

Comparison: claude-proxy vs Alternatives

Option	Requires API key?	Cost model	Tool calling
claude-proxy	No (uses CLI session)	Claude Max subscription	Prompt-based emulation
Anthropic API direct	Yes	Per-token	Native
OpenRouter	Yes	Per-token + margin	Native
LiteLLM proxy	Yes	Pass-through	Varies by provider

Get Started

Clone the repo and run it. It takes less than 60 seconds:

git clone https://github.com/rephapeng/claude-proxy.git
cd claude-proxy
python3 claude_openai_proxy.py --port 8088

Then point any OpenAI-compatible client at http://localhost:8088/v1. No API key needed. Paste anything non-empty. The proxy ignores it.

Star the repo if it saves you from paying for tokens twice. Contributions welcome.

Repository: github.com/rephapeng/claude-proxy License: MIT Language: Python 3 (stdlib only, zero dependencies) Lines of code: ~300 lines core proxy + extensible module system