Documentation

Current API

Current is a drop-in OpenAI-compatible inference marketplace. Sellers list competing offers. It scores them on five axes (price, latency, uptime, liquidity, and health), routes each request to the highest-scoring offer, and fails over automatically. One API, one bill, and the competitive floor every time.

Introduction

If you already use the OpenAI SDK, you can switch to Current by changing two values: the base URL and the API key. Everything else (chat, completions, embeddings, streaming, tools) works unchanged. Current then routes each request across providers and returns a non-breaking x_current object so you can see exactly which provider served it and what it cost.

Base URL	https://api.currentinference.com
Auth header	Authorization: Bearer cur_live_...
Content type	application/json
Streaming	text/event-stream (SSE)

Quickstart

1. Create an API key in the dashboard (new accounts start with free credit). 2. Point any OpenAI-compatible client at Current. 3. Make your first call:

curl

curl https://api.currentinference.com/v1/chat/completions \
  -H "Authorization: Bearer cur_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b",
    "messages": [{"role": "user", "content": "Explain routing in one sentence."}]
  }'

With the OpenAI SDK (Python)

python

from openai import OpenAI

client = OpenAI(
    api_key="cur_live_...",
    base_url="https://api.currentinference.com/v1",   # the only change
)

resp = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

With the OpenAI SDK (TypeScript)

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "cur_live_...",
  baseURL: "https://api.currentinference.com/v1",   // the only change
});

const resp = await client.chat.completions.create({
  model: "llama-3.3-70b",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(resp.choices[0].message.content);

Authentication

Every request must include your secret key as a bearer token. Keys are created in the dashboard, shown once at creation, and prefixed cur_live_. Keep them server-side; never ship a key in client-side code.

bash

Authorization: Bearer cur_live_...

A missing or invalid key returns 401 invalid_api_key. Revoke a leaked key from the dashboard at any time, and revoked keys stop working immediately. Keys can carry an optional monthly spend cap (402 key_cap_reached once hit).

cur_test_ keys are a free sandbox: requests route through the real engine but are served by Current’s deterministic mock provider at $0, which is ideal for CI and integration tests. Sandbox responses carry x_current.mode: "test".

Models

Request a model by its Current id and the router resolves it to the cheapest healthy provider that serves it. The catalog spans 15 providers (OpenAI, Anthropic, Groq, Together, Fireworks, DeepSeek, Mistral, xAI, Google, Cerebras, SambaNova, DeepInfra, Novita, Nebius, and Venice) and the high-spread open models (Llama 3.3/4, DeepSeek V3/R1, Qwen3, Kimi K2, and more). The live, authoritative list is always GET /v1/models, and the cross-provider price board is public at GET /v1/network.

Model id	Type	Example spread
llama-3.3-70b	chat	9 providers (Groq, DeepInfra, Cerebras, Venice, …)
deepseek-v3	chat	6 providers
kimi-k2	chat	5 providers
qwen3-235b	chat	5 providers
gpt-4o-mini · gpt-4.1-mini · gpt-5-mini	chat	OpenAI
claude-haiku · claude-haiku-4.5	chat	Anthropic
venice-uncensored	chat	Venice (zero data retention)
text-embedding-3-small	embeddings	OpenAI

Aliases & shortcuts: auto resolves to the flagship default. Suffixes set the routing objective in the model id itself: llama-3.3-70b:cheapest (cost-only) and llama-3.3-70b:fastest (latency-only); :floor and :nitro are accepted as synonyms. A fully-qualified provider/model id (e.g. groq/llama-3.3-70b) pins that provider.

Honest caveat: the “same” open model can differ across providers in quantization, context window, and tokenizer, so outputs and token counts are not byte-identical. x_current.selected_provider always tells you who served a request. Embedding ids are never aliased across different underlying models (vectors are only comparable within one model).

Chat completions

POST /v1/chat/completions, OpenAI-compatible. Accepts the standard fields (messages, temperature, max_tokens, tools, response_format, stream, …) plus the optional Current routing extension. Unknown fields are ignored where OpenAI tolerates them.

json request

{
  "model": "llama-3.3-70b",
  "messages": [{"role": "user", "content": "Hello"}],
  "routing": {
    "cost": 0.6, "latency": 0.2, "uptime": 0.1, "liquidity": 0.05, "health": 0.05,
    "providers": ["groq", "together"],
    "provider": null,
    "max_cost_per_mtok": 1.00
  }
}

The response is the standard OpenAI object with an added x_current:

json response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1733500000,
  "model": "llama-3.3-70b",
  "choices": [
    { "index": 0, "message": { "role": "assistant", "content": "Hi!" }, "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 },
  "x_current": {
    "selected_provider": "groq",
    "offer_id": 3,
    "score": 0.60,
    "score_breakdown": { "cost": 0.40, "latency": 0.20, "uptime": 0.0, "liquidity": 0.0, "health": 0.0 },
    "provider_cost_per_mtok": 0.60,
    "billed_per_mtok": 0.60,
    "failover_count": 0,
    "cache_hit": false,
    "failover_order": ["groq", "fireworks", "together"],
    "usage": { "estimated": false, "provider_cost_usd": 0.000014, "routing_fee_usd": 0.000001, "total_usd": 0.000015 },
    "savings": { "vs_most_expensive_usd": 0.000006, "pct": 31.2 }
  }
}

Streaming

Set stream: true for standard OpenAI SSE: data: lines of chat.completion.chunk objects ending with data: [DONE]. The x_current routing decision and final usage arrive on the last chunk. Failover happens before the first byte. Once streaming starts there is no silent provider switch, and a mid-stream upstream failure is surfaced as an SSE error event.

bash

curl https://api.currentinference.com/v1/chat/completions \
  -H "Authorization: Bearer cur_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "model": "llama-3.3-70b",
        "messages": [{"role":"user","content":"Stream a haiku."}],
        "stream": true }'

# Server-sent events:
# data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hi"}}]}
# data: {"id":"...","choices":[{"delta":{}}],"usage":{...},"x_current":{...}}
# data: [DONE]

Embeddings

POST /v1/embeddings takes a single string or an array. Returns the OpenAI { object: "list", data: [{ embedding, index }], model, usage } shape, routed to a provider that serves the embedding model.

curl

curl https://api.currentinference.com/v1/embeddings \
  -H "Authorization: Bearer cur_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "model": "text-embedding-3-small", "input": ["hello", "world"] }'

Legacy completions

POST /v1/completions is the legacy text completion endpoint. It uses the same routing and billing as chat. Pass prompt instead of messages.

List models

GET /v1/models returns the routable models as the OpenAI list shape:

json

{ "object": "list", "data": [ { "id": "llama-3.3-70b", "object": "model", "owned_by": "current" } ] }

The routing extension

Every inference call accepts an optional routing object to steer or override the decision per request. All fields are optional; omitted weights fall back to your account defaults (cost 0.40 · latency 0.20 · uptime 0.20 · liquidity 0.10 · health 0.10). Higher weight = that axis matters more; the highest total score wins.

Field	Type	Meaning
cost	number	Weight on price (prefer cheaper offers).
latency	number	Weight on speed.
uptime	number	Weight on the offer’s rolling success rate.
liquidity	number	Weight on available capacity.
health	number	Weight on the offer reputation (0 to 100).
providers	string[]	Allowed set. Route only among these.
provider	string \| null	Pin one provider (skips routing).
max_cost_per_mtok	number	Hard ceiling on billed price ($/Mtok).
cache	boolean	Reserved. Response caching is on the roadmap, so it’s ignored today (`cache_hit` is always false).

Via the OpenAI SDK, send it through extra_body:

python

client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={"routing": {"cost": 0.7, "latency": 0.2, "uptime": 0.1}},
)

The x_current object

Every response carries x_current, the routing decision laid out in the open:

Field	Meaning
selected_provider	Provider whose offer served the request.
offer_id	The winning offer’s id.
score	Winning (highest) routing score.
score_breakdown	Per-axis contribution (cost / latency / uptime / liquidity / health).
provider_cost_per_mtok	The offer’s blended price.
billed_per_mtok	What you pay, the offer’s price (no markup).
failover_count	How many providers were tried before success.
failover_order	The ranked candidate list considered.
usage	Exact money: `provider_cost_usd` + `routing_fee_usd` = `total_usd`, plus `estimated`, which is true when a provider omitted token counts and Current estimated them (tiktoken).
savings	What this request saved vs the most expensive eligible candidate (`vs_most_expensive_usd`, `pct`).
requested_model	Present when an alias (`auto`) or suffix (`:cheapest`) was resolved.
cache_hit	Whether a cached response was served (always false until caching ships).

Every response also carries an X-Request-Id header (or echoes yours). Include it when contacting support so we can trace the exact request.

Routing preview

GET /v1/routing/preview?model=<id> returns the full ranked candidate list and score breakdown for a model without running inference. It’s the data behind the dashboard’s “why this provider?” view.

json

{
  "model": "llama-3.3-70b",
  "weights": { "cost": 0.4, "latency": 0.2, "uptime": 0.2, "liquidity": 0.1, "health": 0.1 },
  "candidates": [
    { "provider": "groq", "offer_id": 3, "score": 0.60,
      "breakdown": { "cost": 0.40, "latency": 0.20, "uptime": 0.0, "liquidity": 0.0, "health": 0.0 },
      "cost_per_mtok": 0.60, "billed_per_mtok": 0.60, "latency_ms": 120, "uptime": 0.985, "health_score": 100 }
  ],
  "selected": "groq"
}

Errors

Errors use the OpenAI envelope, so the OpenAI SDK surfaces them natively. Branch on the stable code rather than the message. Current never returns an untyped 500.

json

{
  "error": {
    "message": "The model 'gpt-9' does not exist or is not routable.",
    "type": "invalid_request_error",
    "param": "model",
    "code": "model_not_found"
  }
}

HTTP	code	When
401	invalid_api_key	Missing or bad key.
402	credit_exhausted	Prepaid balance is ≤ 0. Top up to continue.
400	model_not_found	Unknown / non-routable model.
400	unsupported_parameter	No provider supports a requested parameter.
400	context_length_exceeded	Prompt exceeds every provider’s context window.
400	cost_ceiling_exceeded	No provider within `max_cost_per_mtok`.
404	no_provider_available	No provider serves the model.
404	pinned_provider_unavailable	Pinned provider can’t serve the model.
413	payload_too_large	Request body over the size cap.
402	key_cap_reached	This key’s monthly spend cap is reached.
404	not_found	Unknown route/method (still the OpenAI envelope).
429	rate_limit_exceeded	Over your rate limit. See `Retry-After`.
502	upstream_error · upstream_unreachable	The selected provider failed after streaming began (pre-stream failures fail over automatically).
503	all_providers_down	Every candidate is currently failing.
503	no_provider_configured	No provider for this model has credentials on this deployment.

Rate limits

Limits are enforced per API key on the inference path (plus per-IP throttles on auth endpoints). When you exceed a limit you receive 429 rate_limit_exceeded with a Retry-After header (seconds to wait); responses also carry X-RateLimit-Limit and X-RateLimit-Remaining. Back off and retry after the indicated delay.

Pricing & billing

You pay the winning offer’s price, the competitive floor, with no markup. Input and output tokens are billed at the offer’s separate in/out prices. The marketplace keeps a flat 5% fee out of the seller’s settlement (never added to your bill), and the seller is credited the remaining 95%. x_current.usage carries the exact micro-dollar breakdown per request, where total_usd (what you pay) = provider_cost_usd (seller settlement) + routing_fee_usd (marketplace fee). x_current.savings shows what you saved vs the priciest offer. Billing is prepaid: top up by card in the dashboard (USDC coming), spend is metered in micro-dollars, and the balance hard-stops at zero (402 credit_exhausted). New accounts get free credit to try it.

SDKs

You don’t need a Current SDK. The stock OpenAI SDKs work (see Quickstart). The official clients live in the repo (publication to PyPI/npm is in progress) and add first-class, typed access to the routing extension and x_current, retries with Retry-After handling, and streaming helpers.

Python

bash

# not on PyPI yet, so install from the repo:
pip install "git+https://github.com/ekempinski/infera.git#subdirectory=sdk/python"

python

from current import Current, CurrentError

client = Current(api_key="cur_live_...")  # base_url defaults to https://api.currentinference.com

resp = client.chat_completions(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp["choices"][0]["message"]["content"])
print(resp["x_current"]["selected_provider"], resp["x_current"]["billed_per_mtok"])

TypeScript

bash

# not on npm yet, so vendor sdk/typescript from the repo for now;
# or simply use the stock OpenAI SDK (above), which works unchanged.

typescript

import { Current } from "@current/sdk";

const current = new Current({ apiKey: process.env.CURRENT_API_KEY! });

const out = await current.chatCompletions({
  model: "llama-3.3-70b",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(out.choices[0].message.content, out.x_current?.selected_provider);

Get your API key Read the whitepaper →