Skip to main content
The Swarms API exposes an OpenAI-compatible POST /v1/chat/completions endpoint. If your application already uses the OpenAI SDK, you can switch to Swarms by changing two lines — the base_url and api_key — and everything else works unchanged. Under the hood, every request is routed through the full Swarms agent infrastructure: model routing, token counting, billing, and logging all apply exactly as they do for the native /v1/agent/completions endpoint.

Endpoint Information

  • URL: /v1/chat/completions
  • Method: POST
  • Authentication: Required (x-api-key header or Authorization: Bearer <key>)
  • Rate Limiting: Subject to tier-based rate limits

Authentication

Two authentication methods are supported. Both work on all Swarms API endpoints.
MethodHeaderExample
API key headerx-api-key: <key>x-api-key: sk-abc123
Bearer tokenAuthorization: Bearer <key>Authorization: Bearer sk-abc123
The Bearer token method is what the OpenAI SDK sends by default, so it works out of the box.
Get your API key at swarms.world/platform/api-keys.

Request Schema

ChatCompletionRequest Object

ParameterTypeRequiredDefaultDescription
modelstringYesModel to use for completion (e.g. gpt-4o, claude-sonnet-4-20250514, gpt-4o-mini). Any model supported by the Swarms API is accepted
messagesList[ChatMessage]YesA list of messages comprising the conversation (see ChatMessage Object)
temperaturefloatNo0.5Sampling temperature (0.0 – 2.0). Lower values produce more deterministic output
max_tokensintegerNo8192Maximum number of tokens to generate in the response
max_completion_tokensintegerNoAlternative to max_tokens. Takes precedence if both are set
streambooleanNofalseIf true, returns Server-Sent Events (SSE) in the OpenAI chunk format
top_pfloatNoNucleus sampling parameter. An alternative to temperature sampling
presence_penaltyfloatNoPenalize tokens based on whether they have appeared in the text so far
frequency_penaltyfloatNoPenalize tokens based on how frequently they appear in the text so far
nintegerNo1Number of completions to generate. Only 1 is supported — requests with n > 1 are rejected
userstringNoA unique identifier for the end-user, used for tracking
max_loopsintegerNo1Swarms extension. Maximum number of agent reasoning loops. 1 = single pass (default). Higher values let the agent iterate on its own output. Pass via extra_body in the OpenAI SDK

ChatMessage Object

Each message in the messages array:
FieldTypeRequiredDescription
rolestringYesOne of system, user, or assistant
contentstring or List[ContentPart]YesText content, or an array of content parts for multimodal input
namestringNoAn optional name for the participant

ContentPart (Multimodal)

When content is an array, each element is a content part: Text part:
{"type": "text", "text": "Describe this image."}
Image part:
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
The url field accepts both HTTPS URLs and base64-encoded data URIs (data:image/png;base64,...).

Validation Rules

  • At least one message with role: "user" is required
  • n must be 1 — multiple completions per request are not supported (send separate requests instead)
  • Requests with zero messages or only system messages are rejected

Example Request Body

{
  "model": "gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "temperature": 0.5,
  "max_tokens": 1024,
  "stream": false,
  "max_loops": 1
}

Response Schema

ChatCompletionResponse Object (Non-Streaming)

FieldTypeDescription
idstringUnique completion identifier, prefixed with chatcmpl-
objectstringAlways "chat.completion"
createdintegerUnix timestamp of when the completion was generated
modelstringThe model that was used (echoes back the requested model name)
choicesList[Choice]Array containing the completion result (always one element)
usageCompletionUsageToken usage counts for billing

Choice Object

FieldTypeDescription
indexintegerAlways 0 (single-choice responses)
messageChatMessageThe assistant’s response with role: "assistant"
finish_reasonstringWhy the model stopped generating — "stop" for normal completion

CompletionUsage Object

FieldTypeDescription
prompt_tokensintegerNumber of tokens in the input (system prompt + history + task)
completion_tokensintegerNumber of tokens in the generated response
total_tokensintegerSum of prompt_tokens and completion_tokens

Example Response

{
  "id": "chatcmpl-a1b2c3d4e5f6789012345678901",
  "object": "chat.completion",
  "created": 1711300000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits) that can exist in multiple states simultaneously..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

Streaming Response Schema

When stream: true is set, the response is returned as Server-Sent Events (SSE). Each event is a data: line containing a JSON chunk.

StreamChunk Object

FieldTypeDescription
idstringSame chatcmpl- ID shared across all chunks in the stream
objectstringAlways "chat.completion.chunk"
createdintegerUnix timestamp (same across all chunks)
modelstringThe requested model name
choicesList[StreamChoice]Array with one element containing the delta

StreamChoice Object

FieldTypeDescription
indexintegerAlways 0
deltaobjectIncremental content — see stream sequence below
finish_reasonstring or nullnull during streaming, "stop" on the final chunk

Stream Sequence

Orderdeltafinish_reasonPurpose
First chunk{"role": "assistant"}nullRole declaration
Content chunks{"content": "..."}nullIncremental text content
Final chunk{}"stop"Signals completion
Terminatordata: [DONE]SSE stream end marker

Example Stream

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1711300000,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Error Response Schema

Errors are returned in the standard OpenAI error format so the OpenAI SDK’s built-in error classes work correctly:

Error Object

FieldTypeDescription
error.messagestringHuman-readable error description
error.typestringError category (see table below)
error.codestring or nullMachine-readable error code
error.paramstring or nullThe parameter that caused the error

Error Types

HTTP StatustypeWhen
400invalid_request_errorMalformed request, validation failure, missing required fields
401authentication_errorMissing or invalid API key
403permission_errorInsufficient permissions or subscription tier
429rate_limit_errorRate limit exceeded
500server_errorInternal error during agent execution

Example Error Response

{
  "error": {
    "message": "At least one message with role 'user' is required.",
    "type": "invalid_request_error",
    "code": "invalid_request",
    "param": null
  }
}

Code Examples

Non-Streaming Completion

from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What are the key trends in renewable energy?"},
    ],
    max_tokens=1024,
    temperature=0.5,
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Streaming Completion

from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about AI agents."}],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
print()

Multi-Turn Conversation

from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a math tutor."},
        {"role": "user", "content": "What is the derivative of x^2?"},
        {"role": "assistant", "content": "The derivative of x^2 is 2x."},
        {"role": "user", "content": "What about x^3?"},
    ],
)

print(response.choices[0].message.content)

Error Handling

from openai import (
    OpenAI,
    APIError,
    AuthenticationError,
    BadRequestError,
    PermissionDeniedError,
    RateLimitError,
)

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(response.choices[0].message.content)
except AuthenticationError:
    print("Missing API key (401)")
except PermissionDeniedError:
    print("Invalid API key or insufficient permissions (403)")
except BadRequestError as e:
    print(f"Validation error (400): {e.message}")
except RateLimitError:
    print("Rate limited — back off and retry")
except APIError as e:
    print(f"API error ({e.status_code}): {e.message}")

Multi-Loop Reasoning

By default the agent runs a single pass (max_loops=1). To let the agent iterate on its own output — useful for complex reasoning, self-correction, or multi-step tasks — pass max_loops via the OpenAI SDK’s extra_body parameter:
from openai import OpenAI

client = OpenAI(
    api_key="your-swarms-api-key",
    base_url="https://api.swarms.world/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a rigorous analyst. Think step by step, then review and refine your answer."},
        {"role": "user", "content": "What are the second-order effects of raising the federal funds rate by 50 basis points?"},
    ],
    max_tokens=2048,
    extra_body={"max_loops": 3},
)

print(response.choices[0].message.content)
max_loops is a Swarms extension field — it is not part of the OpenAI API spec. In the Python OpenAI SDK, use extra_body={"max_loops": N} to pass it. In cURL or raw HTTP, include it directly in the JSON body.

How It Maps to Swarms Internals

For users already familiar with the native Swarms API, here is how the OpenAI request fields map to AgentCompletion and AgentSpec:
OpenAI FieldSwarms EquivalentNotes
modelAgentSpec.model_namePassed through as-is
messages (system)AgentSpec.system_promptDefaults to “You are a helpful assistant.” if absent
messages (last user)AgentCompletion.taskThe actual prompt the agent runs on
messages (prior turns)AgentCompletion.historyUser and assistant messages before the final user message
messages (image_url parts)AgentCompletion.img / imgsExtracted from multimodal content parts
temperatureAgentSpec.temperatureDefaults to 0.5
max_tokens / max_completion_tokensAgentSpec.max_tokensmax_completion_tokens takes precedence; defaults to 8192
top_pAgentSpec.llm_args.top_pPassed through to the underlying LLM
presence_penaltyAgentSpec.llm_args.presence_penaltyPassed through to the underlying LLM
frequency_penaltyAgentSpec.llm_args.frequency_penaltyPassed through to the underlying LLM
max_loopsAgentSpec.max_loopsDefaults to 1. Higher values enable multi-loop reasoning
streamRoute dispatchtrue returns StreamingResponse with SSE; false returns JSON
The agent is created with max_loops set to the requested value (defaults to 1 for single-turn) and streaming_on=False (the agent itself runs to completion; streaming is simulated at the HTTP layer by chunking the result).

Supported Models

The model field accepts any model supported by the Swarms API. Common options:
ProviderModels
OpenAIgpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini
Anthropicclaude-sonnet-4-20250514, claude-3-7-sonnet-latest
Groqgroq/llama3-70b-8192, groq/deepseek-r1-distill-llama-70b
For the full list, call GET /v1/models/available with your API key.

Differences from the OpenAI API

BehaviorOpenAI APISwarms API
n > 1Returns multiple choicesRejected with error — send separate requests
Tool calling / function callingSupportedNot supported on this endpoint. Use /v1/agent/completions with tools_list_dictionary
logprobsSupportedNot supported
Response format (json_object)SupportedNot supported on this endpoint. Use /v1/agent/completions with structured output
StreamingTrue token-by-token streamingSimulated — the agent runs to completion, then the result is delivered in chunks
max_loopsNot applicableSwarms extension — multi-loop agent reasoning (pass via extra_body)

Billing

Usage is metered and billed identically to the native /v1/agent/completions endpoint:
  • Input tokens are counted from the combined system prompt, conversation history, and task
  • Output tokens are counted from the agent’s response
  • Credits are deducted automatically after each completion
  • The usage field in the response shows the exact token counts
Check your balance anytime with GET /v1/users/me/credits.