Documentation

Current API

Current is a drop-in OpenAI-compatible inference marketplace. Sellers list competing offers. It scores them on five axes (price, latency, uptime, liquidity, and health), routes each request to the highest-scoring offer, and fails over automatically. One API, one bill, and the competitive floor every time.

Introduction

If you already use the OpenAI SDK, you can switch to Current by changing two values: the base URL and the API key. Everything else (chat, completions, embeddings, streaming, tools) works unchanged. Current then routes each request across providers and returns a non-breaking x_current object so you can see exactly which provider served it and what it cost.

Base URLhttps://api.currentinference.com
Auth headerAuthorization: Bearer cur_live_...
Content typeapplication/json
Streamingtext/event-stream (SSE)

Quickstart

1. Create an API key in the dashboard (new accounts start with free credit). 2. Point any OpenAI-compatible client at Current. 3. Make your first call:

curl
curl https://api.currentinference.com/v1/chat/completions \
  -H "Authorization: Bearer cur_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [{"role": "user", "content": "Explain routing in one sentence."}]
  }'

With the OpenAI SDK (Python)

python
from openai import OpenAI

client = OpenAI(
    api_key="cur_live_...",
    base_url="https://api.currentinference.com/v1",   # the only change
)

resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

With the OpenAI SDK (TypeScript)

typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "cur_live_...",
  baseURL: "https://api.currentinference.com/v1",   // the only change
});

const resp = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

Authentication

Every request must include your secret key as a bearer token. Keys are created in the dashboard, shown once at creation, and prefixed cur_live_. Keep them server-side; never ship a key in client-side code.

bash
Authorization: Bearer cur_live_...

A missing or invalid key returns 401 invalid_api_key. Revoke a leaked key from the dashboard at any time, and revoked keys stop working immediately. Keys can carry an optional monthly spend cap (402 key_cap_reached once hit).

cur_test_ keys are a free sandbox: requests route through the real engine but are served by Current’s deterministic mock provider at $0, which is ideal for CI and integration tests. Sandbox responses carry x_current.mode: "test".

Models

Request a model by its Current id and the router resolves it to the cheapest healthy provider that serves it. The catalog spans 15 providers (OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Mistral, xAI, Google, Cerebras, SambaNova, DeepInfra, Novita, Nebius, and Venice) and the high-spread open models (Llama 3.3/4, DeepSeek V3/R1, Qwen3, Kimi K2, and more). The live, authoritative list is always GET /v1/models, and the cross-provider price board is public at GET /v1/network.

Model idTypeExample spread
llama-3.3-70bchat9 providers (Groq, DeepInfra, Cerebras, Venice, …)
deepseek-v3chat6 providers
kimi-k2chat5 providers
qwen3-235bchat5 providers
gpt-4o-mini · gpt-4.1-mini · gpt-5-minichatOpenAI
claude-haiku · claude-haiku-4.5chatAnthropic
venice-uncensoredchatVenice (zero data retention)
text-embedding-3-smallembeddingsOpenAI

Aliases & shortcuts: auto resolves to the flagship default. Suffixes set the routing objective in the model id itself: llama-3.3-70b:cheapest (cost-only) and llama-3.3-70b:fastest (latency-only); :floor and :nitro are accepted as synonyms. A fully-qualified provider/model id (e.g. groq/llama-3.3-70b) pins that provider.

Honest caveat: the “same” open model can differ across providers in quantization, context window, and tokenizer, so outputs and token counts are not byte-identical. x_current.selected_provider always tells you who served a request. Embedding ids are never aliased across different underlying models (vectors are only comparable within one model).

Chat completions

POST /v1/chat/completions, OpenAI-compatible. Accepts the standard fields (messages, temperature, max_tokens, tools, response_format, stream, …) plus the optional Current routing extension. Unknown fields are ignored where OpenAI tolerates them.

json request
{
  "model": "llama-3.3-70b",
  "messages": [{"role": "user", "content": "Hello"}],
  "routing": {
    "cost": 0.6, "latency": 0.2, "uptime": 0.1, "liquidity": 0.05, "health": 0.05,
    "providers": ["groq", "together"],
    "provider": null,
    "max_cost_per_mtok": 1.00
  }
}

The response is the standard OpenAI object with an added x_current:

json response
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1733500000,
  "model": "llama-3.3-70b",
  "choices": [
    { "index": 0, "message": { "role": "assistant", "content": "Hi!" }, "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 },
  "x_current": {
    "selected_provider": "groq",
    "offer_id": 3,
    "score": 0.60,
    "score_breakdown": { "cost": 0.40, "latency": 0.20, "uptime": 0.0, "liquidity": 0.0, "health": 0.0 },
    "provider_cost_per_mtok": 0.60,
    "billed_per_mtok": 0.60,
    "failover_count": 0,
    "cache_hit": false,
    "failover_order": ["groq", "fireworks", "together"],
    "usage": { "estimated": false, "provider_cost_usd": 0.000014, "routing_fee_usd": 0.000001, "total_usd": 0.000015 },
    "savings": { "vs_most_expensive_usd": 0.000006, "pct": 31.2 }
  }
}

Streaming

Set stream: true for standard OpenAI SSE: data: lines of chat.completion.chunk objects ending with data: [DONE]. The x_current routing decision and final usage arrive on the last chunk. Failover happens before the first byte. Once streaming starts there is no silent provider switch, and a mid-stream upstream failure is surfaced as an SSE error event.

bash
curl https://api.currentinference.com/v1/chat/completions \
  -H "Authorization: Bearer cur_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "model": "llama-3.3-70b",
        "messages": [{"role":"user","content":"Stream a haiku."}],
        "stream": true }'

# Server-sent events:
# data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hi"}}]}
# data: {"id":"...","choices":[{"delta":{}}],"usage":{...},"x_current":{...}}
# data: [DONE]

Embeddings

POST /v1/embeddings takes a single string or an array. Returns the OpenAI { object: "list", data: [{ embedding, index }], model, usage } shape, routed to a provider that serves the embedding model.

curl
curl https://api.currentinference.com/v1/embeddings \
  -H "Authorization: Bearer cur_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "model": "text-embedding-3-small", "input": ["hello", "world"] }'

Legacy completions

POST /v1/completions is the legacy text completion endpoint. It uses the same routing and billing as chat. Pass prompt instead of messages.

List models

GET /v1/models returns the routable models as the OpenAI list shape:

json
{ "object": "list", "data": [ { "id": "llama-3.3-70b", "object": "model", "owned_by": "current" } ] }

The routing extension

Every inference call accepts an optional routing object to steer or override the decision per request. All fields are optional; omitted weights fall back to your account defaults (cost 0.40 · latency 0.20 · uptime 0.20 · liquidity 0.10 · health 0.10). Higher weight = that axis matters more; the highest total score wins.

FieldTypeMeaning
costnumberWeight on price (prefer cheaper offers).
latencynumberWeight on speed.
uptimenumberWeight on the offer’s rolling success rate.
liquiditynumberWeight on available capacity.
healthnumberWeight on the offer reputation (0 to 100).
providersstring[]Allowed set. Route only among these.
providerstring | nullPin one provider (skips routing).
max_cost_per_mtoknumberHard ceiling on billed price ($/Mtok).
cachebooleanReserved. Response caching is on the roadmap, so it’s ignored today (cache_hit is always false).

Via the OpenAI SDK, send it through extra_body:

python
client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"routing": {"cost": 0.7, "latency": 0.2, "uptime": 0.1}},
)

The x_current object

Every response carries x_current, the routing decision laid out in the open:

FieldMeaning
selected_providerProvider whose offer served the request.
offer_idThe winning offer’s id.
scoreWinning (highest) routing score.
score_breakdownPer-axis contribution (cost / latency / uptime / liquidity / health).
provider_cost_per_mtokThe offer’s blended price.
billed_per_mtokWhat you pay, the offer’s price (no markup).
failover_countHow many providers were tried before success.
failover_orderThe ranked candidate list considered.
usageExact money: provider_cost_usd + routing_fee_usd = total_usd, plus estimated, which is true when a provider omitted token counts and Current estimated them (tiktoken).
savingsWhat this request saved vs the most expensive eligible candidate (vs_most_expensive_usd, pct).
requested_modelPresent when an alias (auto) or suffix (:cheapest) was resolved.
cache_hitWhether a cached response was served (always false until caching ships).

Every response also carries an X-Request-Id header (or echoes yours). Include it when contacting support so we can trace the exact request.

Routing preview

GET /v1/routing/preview?model=<id> returns the full ranked candidate list and score breakdown for a model without running inference. It’s the data behind the dashboard’s “why this provider?” view.

json
{
  "model": "llama-3.3-70b",
  "weights": { "cost": 0.4, "latency": 0.2, "uptime": 0.2, "liquidity": 0.1, "health": 0.1 },
  "candidates": [
    { "provider": "groq", "offer_id": 3, "score": 0.60,
      "breakdown": { "cost": 0.40, "latency": 0.20, "uptime": 0.0, "liquidity": 0.0, "health": 0.0 },
      "cost_per_mtok": 0.60, "billed_per_mtok": 0.60, "latency_ms": 120, "uptime": 0.985, "health_score": 100 }
  ],
  "selected": "groq"
}

Errors

Errors use the OpenAI envelope, so the OpenAI SDK surfaces them natively. Branch on the stable code rather than the message. Current never returns an untyped 500.

json
{
  "error": {
    "message": "The model 'gpt-9' does not exist or is not routable.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}
HTTPcodeWhen
401invalid_api_keyMissing or bad key.
402credit_exhaustedPrepaid balance is ≤ 0. Top up to continue.
400model_not_foundUnknown / non-routable model.
400unsupported_parameterNo provider supports a requested parameter.
400context_length_exceededPrompt exceeds every provider’s context window.
400cost_ceiling_exceededNo provider within max_cost_per_mtok.
404no_provider_availableNo provider serves the model.
404pinned_provider_unavailablePinned provider can’t serve the model.
413payload_too_largeRequest body over the size cap.
402key_cap_reachedThis key’s monthly spend cap is reached.
404not_foundUnknown route/method (still the OpenAI envelope).
429rate_limit_exceededOver your rate limit. See Retry-After.
502upstream_error · upstream_unreachableThe selected provider failed after streaming began (pre-stream failures fail over automatically).
503all_providers_downEvery candidate is currently failing.
503no_provider_configuredNo provider for this model has credentials on this deployment.

Rate limits

Limits are enforced per API key on the inference path (plus per-IP throttles on auth endpoints). When you exceed a limit you receive 429 rate_limit_exceeded with a Retry-After header (seconds to wait); responses also carry X-RateLimit-Limit and X-RateLimit-Remaining. Back off and retry after the indicated delay.

Pricing & billing

You pay the winning offer’s price, the competitive floor, with no markup. Input and output tokens are billed at the offer’s separate in/out prices. The marketplace keeps a flat 5% fee out of the seller’s settlement (never added to your bill), and the seller is credited the remaining 95%. x_current.usage carries the exact micro-dollar breakdown per request, where total_usd (what you pay) = provider_cost_usd (seller settlement) + routing_fee_usd (marketplace fee). x_current.savings shows what you saved vs the priciest offer. Billing is prepaid: top up by card in the dashboard (USDC coming), spend is metered in micro-dollars, and the balance hard-stops at zero (402 credit_exhausted). New accounts get free credit to try it.

SDKs

You don’t need a Current SDK. The stock OpenAI SDKs work (see Quickstart). The official clients live in the repo (publication to PyPI/npm is in progress) and add first-class, typed access to the routing extension and x_current, retries with Retry-After handling, and streaming helpers.

Python

bash
# not on PyPI yet, so install from the repo:
pip install "git+https://github.com/ekempinski/infera.git#subdirectory=sdk/python"
python
from current import Current, CurrentError

client = Current(api_key="cur_live_...")  # base_url defaults to https://api.currentinference.com

resp = client.chat_completions(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp["choices"][0]["message"]["content"])
print(resp["x_current"]["selected_provider"], resp["x_current"]["billed_per_mtok"])

TypeScript

bash
# not on npm yet, so vendor sdk/typescript from the repo for now;
# or simply use the stock OpenAI SDK (above), which works unchanged.
typescript
import { Current } from "@current/sdk";

const current = new Current({ apiKey: process.env.CURRENT_API_KEY! });

const out = await current.chatCompletions({
  model: "llama-3.3-70b",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(out.choices[0].message.content, out.x_current?.selected_provider);