When you configure multiple Google accounts, the proxy automatically distributes requests across them using configurable selection strategies. This maximizes throughput, avoids rate limits, and provides failover.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/badrisnarayanan/antigravity-claude-proxy/llms.txt
Use this file to discover all available pages before exploring further.
Selection Strategies
Choose a strategy based on your usage pattern:Hybrid (Default)
Smart multi-signal selection using health scores, token buckets, quota awareness, and LRU freshness
Sticky
Cache-optimized: stays on the same account to maximize prompt cache hits
Round-Robin
Maximum throughput: rotates accounts on every request for balanced load
Strategy Comparison
| Strategy | Best For | Behavior | Prompt Caching |
|---|---|---|---|
| Hybrid | Most users | Intelligent selection based on account health, available tokens, quota levels, and rest time | Moderate |
| Sticky | Prompt caching | Stays on same account until rate-limited or unavailable (waits up to 2 minutes) | Excellent |
| Round-Robin | High throughput | Rotates to next account on every request, skips unavailable accounts | Poor |
Configuring Strategy
- CLI Flag
- Web Console
- Environment Variable
Set strategy when starting the server:
Strategy Details
Hybrid Strategy
The default strategy uses multiple signals to select the best account: Scoring Formula:Health Score (Weight: 2)
Health Score (Weight: 2)
Tracks success/failure patterns for each account:
- Success: +5 points (max 100)
- Rate Limit: -15 points
- Failure: -10 points
- Passive Recovery: +1 point per 5 minutes of inactivity
- Minimum Usable: 30 points
Token Bucket (Weight: 5)
Token Bucket (Weight: 5)
Client-side rate limiting to prevent overwhelming the API:
- Max Tokens: 50 per account
- Regeneration: 6 tokens per minute
- Cost: 1 token per request
- Refund: Token returned if request fails
Quota Awareness (Weight: 3)
Quota Awareness (Weight: 3)
Avoids accounts with critically low quota:
- Checks model-specific quota remaining fraction
- Accounts below threshold are excluded
- Threshold priority: per-model > per-account > global
- Default threshold: 0% (disabled)
LRU Freshness (Weight: 0.1)
LRU Freshness (Weight: 0.1)
Prefers accounts that have rested longer:
- Score increases with time since last use
- Capped at 1 hour (3600 seconds)
- Prevents account “starvation”
- Lower weight ensures other signals dominate
- Normal: All filters active (health + tokens + quota)
- Quota Fallback: Bypasses quota filter (better to use critical quota than fail)
- Emergency Fallback: Bypasses health filter + adds 250ms throttle delay
- Last Resort: Bypasses health AND token filters + adds 500ms throttle delay
Sticky Strategy
Optimized for prompt caching by maintaining account continuity: Behavior:- Stays on current account until it becomes unavailable
- Waits up to 2 minutes for short rate limits before switching
- Only switches when:
- Current account rate-limited for > 2 minutes
- Current account is invalid/disabled
- Another account is available immediately
- Long conversations with context reuse
- Maximizing
cache_read_input_tokens - Reducing costs via prompt caching
Round-Robin Strategy
Maximizes throughput by distributing load evenly: Behavior:- Rotates to next account on every request
- Skips rate-limited, invalid, or disabled accounts
- Returns to first account after reaching the end
- High-volume concurrent requests
- Minimizing per-account rate limit hits
- Testing with multiple accounts
Quota Protection
Set minimum quota thresholds to switch accounts before quota runs out:Global Threshold
Server-wide default for all accounts and models:Web Console: Settings → Quota Protection → Global ThresholdConfig File (Default:
~/.config/antigravity-proxy/accounts.json):0 (disabled)Per-Account Threshold
Override global threshold for specific accounts:Web Console: Accounts tab → Account card → Settings → Quota ThresholdConfig File:
Priority Order: Per-model > Per-account > Global > Default (0)Thresholds are fractions (0-0.99) stored in config, displayed as percentages (0-99%) in UI.
Session ID and Caching
The proxy derives session IDs from conversation context to enable prompt caching: How It Works:- Session ID = SHA256 hash of first user message content
- Same session ID used across conversation turns
- Cache is organization-scoped (requires same account)
cache_read_input_tokensreturned when cache hits
| Strategy | Session Consistency | Cache Hit Rate |
|---|---|---|
| Sticky | Excellent - same account throughout | Very High |
| Hybrid | Moderate - changes based on scoring | Medium |
| Round-Robin | Poor - rotates every request | Very Low |
Rate Limit Handling
Automatic Cooldown
Rate-limited accounts are automatically excluded until reset time:- Detection: 429 errors or RESOURCE_EXHAUSTED from API
- Parse Reset Time: Extract from headers or error body
- Mark Account: Set
modelRateLimits[modelId].isRateLimited = true - Auto-Recovery: Clear flag when
resetTimeexpires
Model-Specific Rate Limits
Rate limits are tracked per model, per account:Monitoring Account Health
Web Console Dashboard
The Accounts tab shows real-time health data:- Health Score: Current health points (0-100)
- Token Bucket: Available tokens / max tokens
- Quota Bars: Per-model quota with threshold markers
- Rate Limit Status: Active rate limits with countdown
- Last Used: Timestamp of most recent request
Health Inspector (Developer Mode)
Enable Developer Mode to see detailed strategy metrics:- Settings → Developer Mode → Enable
- Accounts tab → Health Inspector panel appears
- Shows per-account:
- Health score and history
- Token bucket state
- Failure/success counts
- LRU timestamps
API Endpoint
CLI Management Reference
Monitor and manage accounts via CLI:Best Practices
For Long Conversations
For Long Conversations
Use sticky strategy to maximize prompt cache hits:Set per-account quota thresholds to ensure you don’t lose cache mid-conversation:
For High-Volume Usage
For High-Volume Usage
Use round-robin or hybrid strategy:Add multiple accounts to distribute load:
For Production Deployments
For Production Deployments
Use hybrid strategy (default) with quota protection:
- Set global quota threshold:
- Monitor health via web console or API
- Set up alerts for account failures
- Add redundant accounts for failover
For Mixed Workloads
For Mixed Workloads
Use hybrid strategy and configure per-model thresholds:
- High threshold for expensive models (Opus): 25%
- Low threshold for cheap models (Flash): 5%
Troubleshooting
All accounts rate-limited
All accounts rate-limited
Check reset times in web console. If all accounts are exhausted:
- Wait for reset time (usually 1 hour)
- Add more accounts to increase quota pool
- Reduce request volume or use cheaper models
Quota depleting too quickly
Quota depleting too quickly
Set quota protection thresholds:
- Global: Settings → Quota Protection
- Per-account: Account settings modal
- Per-model: Drag markers on quota bars
Strategy not switching accounts
Strategy not switching accounts
If using sticky strategy:
- Check if current account is rate-limited
- Verify other accounts are enabled and valid
- Strategy waits up to 2 minutes for short rate limits
- Check health scores in Health Inspector
- Verify token buckets aren’t depleted
- Check quota thresholds aren’t excluding all accounts
Health scores stuck at low values
Health scores stuck at low values
Health scores recover passively over time (+1 per 5 minutes). To reset:
- Restart the proxy server:
- Or wait for passive recovery
- Successful requests give +5 points immediately
Next Steps
Account Management
Add and configure Google accounts
Web Console
Monitor usage and health visually
Available Models
Explore supported Claude and Gemini models
API Reference
Programmatic access to proxy endpoints