Skip to main content
Set up with:
Wafer throughput vs base SGLang Wafer builds AI that optimizes AI. We take open models and make them dramatically faster. Wafer Pass currently gives you access to Qwen3.5-397B-A17B (running at 1.5–4x the speed of generic inference providers) and GLM-5.1. More fast open models land on the same subscription — no price increase. Wafer Pass is built for Claude Code, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses. Wafer exposes both an OpenAI-compatible endpoint and an Anthropic-compatible Messages endpoint, so tools like Claude Code work out of the box. Get a Wafer Pass for fast open-source models through a standard API endpoint. Plans start at $40/month.
Get your Wafer Pass: https://www.wafer.ai/pass
Wafer Pass is in early access. We’re onboarding developers in small batches. Features, availability, and pricing are subject to change.

Connection Details

Use the credentials from your Wafer access email with these values:
OpenAI-compatible endpointhttps://pass.wafer.ai/v1
Anthropic-compatible endpointhttps://pass.wafer.ai/v1/messages
AuthenticationAPI key
Concurrency1 inflight request per user
See Models below for the model strings to pass on the OpenAI-compatible endpoint. Anthropic-compatible tools (Claude Code) don’t need a model override — Wafer auto-routes.
Claude Code uses the Anthropic Messages endpoint. Set ANTHROPIC_BASE_URL=https://pass.wafer.ai and ANTHROPIC_API_KEY to your Wafer key — Claude Code will hit /v1/messages automatically and Wafer routes all requests to the fastest available model regardless of what model name Claude Code sends. All other harnesses (OpenClaw, Cline, Roo Code, etc.) use the OpenAI-compatible endpoint at https://pass.wafer.ai/v1.

What’s Included

With an active Wafer Pass subscription you get:
  • Qwen3.5-397B-A17B and GLM-5.1 requests with zero per-token costs, optimized by the Wafer Inference Engine
  • Access through a standard OpenAI-compatible API and an Anthropic-compatible Messages API using your Wafer API key
  • Works with Claude Code, OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and other agent harnesses
  • 1 concurrent request today, with higher inflight limits coming soon
  • New fast models as we release them — same subscription, no price increase

Models

model stringFamilyContextNotes
Qwen3.5-397B-A17BQwen3.5, 397B MoE128K1.5–4x faster than base SGLang (see chart above)
GLM-5.1Z.AI flagship200K
Pass either model string to any OpenAI-compatible harness configured against https://pass.wafer.ai/v1. Claude Code and other Anthropic-compatible harnesses don’t need a model string — Wafer auto-routes.

Pricing

Pay monthly or save 20% with yearly billing.

Monthly

PlanPriceRequests / 5hr windowOverage (input)Overage (output)
Starter$40/mo1,000$0.60/M tokens$4.00/M tokens
Pro$100/mo5,000$0.40/M tokens$2.60/M tokens
Max$250/mo20,000$0.30/M tokens$2.00/M tokens

Yearly (20% off)

PlanPriceEffective monthlyRequests / 5hr windowOverage (input)Overage (output)
Starter$384/yr$32/mo1,000$0.60/M tokens$4.00/M tokens
Pro$960/yr$80/mo5,000$0.40/M tokens$2.60/M tokens
Max$2,400/yr$200/mo20,000$0.30/M tokens$2.00/M tokens

Getting Started

1

Apply for access

Go to wafer.ai/pass and pick your plan. We’re onboarding in small batches and will notify you when your spot opens.
2

Receive your access email

Once you’re approved, we’ll send you your Wafer endpoint, model ID, and API key.
3

Start coding

Use those credentials in Claude Code, OpenClaw, Cline, Roo Code, Kilo Code, Hermes Agent, OpenHands, or any other supported harness.

Set Up Claude Code

Wafer exposes an Anthropic-compatible Messages endpoint at https://pass.wafer.ai/v1/messages, so Claude Code can connect directly — no proxy needed. For Claude Code, set ANTHROPIC_BASE_URL to https://pass.wafer.ai, not https://pass.wafer.ai/v1.
1

Install Claude Code

npm install -g @anthropic-ai/claude-code
2

Configure Wafer as the endpoint

Set these environment variables in your shell profile (~/.zshrc, ~/.bashrc, etc.):
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
Or add them to ~/.claude/settings.json for a persistent, per-user config:
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://pass.wafer.ai",
    "ANTHROPIC_API_KEY": "YOUR_WAFER_API_KEY"
  }
}
Replace YOUR_WAFER_API_KEY with the key from your Wafer access email.
Do not share your API key or commit it to version control.
3

Start Claude Code

claude
Claude Code now routes requests through the Wafer endpoint. You don’t need to configure a model — Wafer automatically routes all requests to the fastest available model regardless of the model name Claude Code sends. To target a specific model (Qwen3.5-397B-A17B or GLM-5.1), use the OpenAI-compatible endpoint documented in the tool sections below.

Set Up OpenClaw

Model string: the examples below use Qwen3.5-397B-A17B. To use GLM instead, swap Qwen3.5-397B-A17BGLM-5.1 anywhere a model ID appears. Both IDs are accepted on https://pass.wafer.ai/v1. Applies to every OpenAI-compatible setup section that follows (OpenClaw, Hermes Agent, Cline, Roo Code, Kilo Code, OpenHands, and the generic section at the bottom).
1

Install OpenClaw

curl -fsSL https://openclaw.ai/install.sh | bash
2

Run setup

openclaw setup
3

Add Wafer as a provider

Replace YOUR_WAFER_API_KEY with the key from your Wafer access email:
openclaw config set models.providers.wafer "$(cat <<'EOF'
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "api": "openai-completions",
  "auth": "api-key",
  "apiKey": "YOUR_WAFER_API_KEY",
  "models": [{ "id": "Qwen3.5-397B-A17B", "name": "Qwen 3.5 397B" }, { "id": "GLM-5.1", "name": "GLM 5.1" }]
}
EOF
)"
openclaw models set wafer/Qwen3.5-397B-A17B
Do not share your API key or commit it to version control.
4

Test it

openclaw agent --local --session-id wafer-test --message "Hello"

Set Up Hermes Agent

1

Install Hermes Agent

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or source ~/.zshrc
2

Point Hermes at Wafer

Replace YOUR_WAFER_API_KEY with the key from your Wafer access email:
hermes config set OPENAI_BASE_URL https://pass.wafer.ai/v1
hermes config set OPENAI_API_KEY YOUR_WAFER_API_KEY
hermes config set model Qwen3.5-397B-A17B
3

Start a session

hermes
Hermes now uses Qwen3.5-397B-A17B through the Wafer endpoint by default.

Set Up Cline

1

Install Cline

Install the Cline extension from the VS Code marketplace, or search “Cline” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open VS Code and click the Cline icon in the sidebar
  2. Click the settings gear icon in the Cline panel
  3. In the API Provider dropdown, select OpenAI Compatible
  4. Fill in these fields:
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model ID: Qwen3.5-397B-A17B
Do not include /chat/completions in the Base URL — Cline appends that automatically.
3

Set model info (recommended)

Expand Model Configuration and set:
  • Context Window Size: 131072
  • Max Output Tokens: 32768
  • Supports Images: unchecked
4

Verify the connection

Send a message in the Cline panel. If Cline responds, you’re connected.

Set Up Roo Code

1

Install Roo Code

Install the Roo Code extension from the VS Code marketplace, or search “Roo Code” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open VS Code and click the Roo Code icon in the sidebar
  2. Click the settings gear icon in the Roo Code panel
  3. In the API Provider dropdown, select OpenAI Compatible
  4. Fill in these fields:
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model ID: Qwen3.5-397B-A17B
3

Set model info (recommended)

Optionally configure:
  • Context Window Size: 131072
  • Max Output Tokens: 32768
4

Start coding

Send a message in the Roo Code panel to confirm the connection.

Set Up Kilo Code

1

Install Kilo Code

Install the Kilo Code extension from the VS Code marketplace, or search “Kilo Code” in VS Code Extensions.
2

Configure Wafer as a provider

  1. Open Kilo Code and click the settings gear icon
  2. Go to the Providers tab
  3. Click Custom provider at the bottom
  4. Fill in the dialog:
  • Provider ID: wafer
  • Display Name: Wafer
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
  • Model: Qwen3.5-397B-A17B
  1. Click Save
If you’re on an older version of Kilo Code without the Providers tab, select OpenAI Compatible from the API Provider dropdown and enter the same Base URL, API key, and Model ID.
3

Start coding

Send a message in the Kilo Code panel to confirm the connection.

Set Up OpenHands

1

Install OpenHands

Follow the OpenHands installation guide. The quickest way:
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.44-nikolaik
docker run -it --rm \
  -p 3000:3000 \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker.all-hands.dev/all-hands-ai/openhands:0.44
2

Configure Wafer as the LLM (UI)

  1. Open the OpenHands UI (usually at http://localhost:3000)
  2. Click the settings gear icon
  3. Click Advanced to expand advanced options
  4. Set these fields:
  • Custom Model: openai/Qwen3.5-397B-A17B
  • Base URL: https://pass.wafer.ai/v1
  • API Key: your Wafer API key
The openai/ prefix is required. OpenHands uses litellm under the hood, and this prefix tells it to use the OpenAI-compatible completion path.
3

Alternative: config.toml

If you prefer file-based config, create or edit config.toml in the project root:
[llm]
model = "openai/Qwen3.5-397B-A17B"
api_key = "YOUR_WAFER_API_KEY"
base_url = "https://pass.wafer.ai/v1"
4

Start coding

Open a conversation in the OpenHands UI to confirm the connection.

Use Wafer with Other Harnesses

Most agent harnesses only need these settings: OpenAI-compatible harnesses (Cline, Roo Code, Kilo Code, OpenClaw, OpenHands, etc.):
  • Base URL: https://pass.wafer.ai/v1
  • Model: Qwen3.5-397B-A17B or GLM-5.1
  • Authentication: your Wafer API key
  • Compatibility mode: OpenAI-compatible / OpenAI API
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "apiKey": "YOUR_WAFER_API_KEY",
  "model": "Qwen3.5-397B-A17B"
}
Or, for GLM:
{
  "baseUrl": "https://pass.wafer.ai/v1",
  "apiKey": "YOUR_WAFER_API_KEY",
  "model": "GLM-5.1"
}
Anthropic-compatible harnesses (Claude Code, or any tool using the Anthropic Messages API):
  • Base URL: https://pass.wafer.ai (the tool appends /v1/messages automatically)
  • Authentication: your Wafer API key via ANTHROPIC_API_KEY
  • Model: no override needed — Wafer routes all requests to the fastest available model
export ANTHROPIC_BASE_URL=https://pass.wafer.ai
export ANTHROPIC_API_KEY=YOUR_WAFER_API_KEY
If the harness asks for a provider name, you can label it Wafer. If it asks whether your key is a bearer token or an API key, use the same Wafer key from your access email.

FAQ

Wafer Pass currently gives you Qwen3.5-397B-A17B and GLM-5.1 through the Wafer endpoint. More fast open models are coming on the same subscription.
Wafer Pass currently covers Qwen3.5-397B-A17B and GLM-5.1. We’re adding more models soon — same subscription, no price increase.
No. Keep your API key private and use it only for your own workflows.
Apply at wafer.ai/pass. We’re onboarding in small batches.
For OpenAI-compatible harnesses, yes — use Qwen3.5-397B-A17B or GLM-5.1 with the https://pass.wafer.ai/v1 endpoint. For Claude Code (Anthropic-compatible), no — Wafer auto-routes.
Today each user gets 1 concurrent request. We expect to raise that limit over time.
Yes. We’re optimizing the best coding models and adding them to the plan. Price stays the same.