Documentation Index
Fetch the complete documentation index at: https://mintlify.com/badrisnarayanan/antigravity-claude-proxy/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
Request Body
The model to use for generation. Examples:
claude-opus-4-6-thinkingclaude-sonnet-4-5-thinkinggemini-3-flash
GET /v1/models to see all available models.Array of message objects representing the conversation history. Each message has:
role(string): Eitheruserorassistantcontent(string | array): Message content as text or array of content blocks
Maximum number of tokens to generate in the response.For Gemini models, this is automatically capped at 16384 (Gemini’s limit).
Enable streaming mode. When
true, the response is sent as Server-Sent Events (SSE).System instruction to guide the model’s behavior.
Array of tool definitions for function calling. Each tool has:
name(string): Tool namedescription(string): What the tool doesinput_schema(object): JSON Schema for tool parameters
Control which tool the model should use:
{"type": "auto"}- Model decides (default){"type": "any"}- Model must use a tool{"type": "tool", "name": "tool_name"}- Use specific tool
Enable extended thinking for supported models:
Sampling temperature. Higher values make output more random.
Nucleus sampling threshold.
Top-K sampling parameter (Gemini only).
Response
Non-Streaming Response
Unique message identifier.
Always
"message".Always
"assistant".Array of content blocks. Each block can be:
- Text block:
{"type": "text", "text": "..."} - Thinking block:
{"type": "thinking", "thinking": "...", "signature": "..."} - Tool use block:
{"type": "tool_use", "id": "...", "name": "...", "input": {...}}
The model that generated the response.
Why the model stopped generating:
"end_turn"- Natural completion"max_tokens"- Hit token limit"tool_use"- Model called a tool"stop_sequence"- Hit stop sequence
Token usage statistics:
input_tokens(number): Tokens in the promptoutput_tokens(number): Tokens generatedcache_creation_input_tokens(number): Tokens cached (if prompt caching is used)cache_read_input_tokens(number): Tokens read from cache
Streaming Response
Whenstream: true, the response is sent as Server-Sent Events:
Examples
Basic Request
Streaming Request
With Tools
Prompt Caching
The proxy automatically handles prompt caching to reduce latency and token usage:- Caching is organization-scoped (requires same account + session ID)
- Session ID is derived from the SHA256 hash of the first user message
- Cached tokens are reported in
usage.cache_read_input_tokens
How It Works
- First request with a conversation → creates cache
- Subsequent requests with the same account → reads from cache
- If account switches → cache miss, new cache created
Error Responses
400 Bad Request - Invalid Parameters
401 Unauthorized - Missing API Key
503 Service Unavailable - All Accounts Exhausted
400 Bad Request - Quota Exhausted
When all accounts are rate-limited for the requested model:The proxy returns 400 (not 429) for quota exhaustion to prevent clients from automatically retrying. This ensures Claude Code stops cleanly instead of entering a retry loop.