> ## Documentation Index > Fetch the complete documentation index at: https://mintlify.com/badrisnarayanan/antigravity-claude-proxy/llms.txt > Use this file to discover all available pages before exploring further. # Load Balancing > Distribute requests across multiple Google accounts with intelligent selection strategies When you configure multiple Google accounts, the proxy automatically distributes requests across them using configurable selection strategies. This maximizes throughput, avoids rate limits, and provides failover. ## Selection Strategies Choose a strategy based on your usage pattern: Smart multi-signal selection using health scores, token buckets, quota awareness, and LRU freshness Cache-optimized: stays on the same account to maximize prompt cache hits Maximum throughput: rotates accounts on every request for balanced load ### Strategy Comparison | Strategy | Best For | Behavior | Prompt Caching | | --------------- | --------------- | -------------------------------------------------------------------------------------------- | -------------- | | **Hybrid** | Most users | Intelligent selection based on account health, available tokens, quota levels, and rest time | Moderate | | **Sticky** | Prompt caching | Stays on same account until rate-limited or unavailable (waits up to 2 minutes) | Excellent | | **Round-Robin** | High throughput | Rotates to next account on every request, skips unavailable accounts | Poor | ## Configuring Strategy Set strategy when starting the server: ```bash theme={null} # Hybrid (default) acc start --strategy=hybrid # Sticky (cache-optimized) acc start --strategy=sticky # Round-Robin (load-balanced) acc start --strategy=round-robin ``` Change strategy at runtime: 1. Open web console: `http://localhost:8080` 2. Go to **Settings** → **Server** 3. Select **Account Selection Strategy** 4. Click **Save** 5. Restart proxy for changes to take effect Set via environment variable: ```bash theme={null} STRATEGY=sticky acc start ``` ## Strategy Details ### Hybrid Strategy The default strategy uses multiple signals to select the best account: **Scoring Formula:** ``` score = (Health × 2) + ((Tokens / MaxTokens × 100) × 5) + (Quota × 3) + (LRU × 0.1) ``` Tracks success/failure patterns for each account: * **Success**: +5 points (max 100) * **Rate Limit**: -15 points * **Failure**: -10 points * **Passive Recovery**: +1 point per 5 minutes of inactivity * **Minimum Usable**: 30 points Accounts below minimum threshold are excluded unless all accounts are unhealthy (emergency fallback). Client-side rate limiting to prevent overwhelming the API: * **Max Tokens**: 50 per account * **Regeneration**: 6 tokens per minute * **Cost**: 1 token per request * **Refund**: Token returned if request fails Accounts with more available tokens are preferred. Avoids accounts with critically low quota: * Checks model-specific quota remaining fraction * Accounts below threshold are excluded * Threshold priority: per-model > per-account > global * Default threshold: 0% (disabled) See [Quota Protection](#quota-protection) for configuration. Prefers accounts that have rested longer: * Score increases with time since last use * Capped at 1 hour (3600 seconds) * Prevents account "starvation" * Lower weight ensures other signals dominate **Fallback Levels:** When no accounts pass all filters, hybrid strategy progressively relaxes constraints: 1. **Normal**: All filters active (health + tokens + quota) 2. **Quota Fallback**: Bypasses quota filter (better to use critical quota than fail) 3. **Emergency Fallback**: Bypasses health filter + adds 250ms throttle delay 4. **Last Resort**: Bypasses health AND token filters + adds 500ms throttle delay ### Sticky Strategy Optimized for prompt caching by maintaining account continuity: **Behavior:** * Stays on current account until it becomes unavailable * Waits up to 2 minutes for short rate limits before switching * Only switches when: * Current account rate-limited for > 2 minutes * Current account is invalid/disabled * Another account is available immediately **Best For:** * Long conversations with context reuse * Maximizing `cache_read_input_tokens` * Reducing costs via prompt caching **Prompt Cache Continuity** Sticky strategy maintains session ID consistency by staying on the same account. Session IDs are derived from the first user message hash, ensuring cache hits across conversation turns. ### Round-Robin Strategy Maximizes throughput by distributing load evenly: **Behavior:** * Rotates to next account on every request * Skips rate-limited, invalid, or disabled accounts * Returns to first account after reaching the end **Best For:** * High-volume concurrent requests * Minimizing per-account rate limit hits * Testing with multiple accounts Round-robin breaks prompt cache continuity since each request may use a different account (different organization scope). ## Quota Protection Set minimum quota thresholds to switch accounts before quota runs out: Server-wide default for all accounts and models: **Web Console**: Settings → Quota Protection → Global Threshold **Config File** (`~/.config/antigravity-proxy/accounts.json`): ```json theme={null} { "settings": { "globalQuotaThreshold": 0.10 // Switch at 10% remaining } } ``` Default: `0` (disabled) Override global threshold for specific accounts: **Web Console**: Accounts tab → Account card → Settings → Quota Threshold **Config File**: ```json theme={null} { "accounts": [ { "email": "user@gmail.com", "quotaThreshold": 0.20 // Switch at 20% for this account } ] } ``` Set different thresholds for specific models on an account: **Web Console**: Models tab → Drag threshold markers on quota bars **Config File**: ```json theme={null} { "accounts": [ { "email": "user@gmail.com", "modelQuotaThresholds": { "claude-opus-4-6-thinking": 0.25, // 25% for Opus "gemini-3-flash": 0.05 // 5% for Gemini } } ] } ``` **Priority Order**: Per-model > Per-account > Global > Default (0) Thresholds are fractions (0-0.99) stored in config, displayed as percentages (0-99%) in UI. ## Session ID and Caching The proxy derives session IDs from conversation context to enable prompt caching: **How It Works:** 1. Session ID = SHA256 hash of first user message content 2. Same session ID used across conversation turns 3. Cache is organization-scoped (requires same account) 4. `cache_read_input_tokens` returned when cache hits **Strategy Impact:** | Strategy | Session Consistency | Cache Hit Rate | | --------------- | ----------------------------------- | -------------- | | **Sticky** | Excellent - same account throughout | Very High | | **Hybrid** | Moderate - changes based on scoring | Medium | | **Round-Robin** | Poor - rotates every request | Very Low | For conversations where you want to maximize caching, use sticky strategy: ```bash theme={null} acc start --strategy=sticky ``` ## Rate Limit Handling ### Automatic Cooldown Rate-limited accounts are automatically excluded until reset time: 1. **Detection**: 429 errors or RESOURCE\_EXHAUSTED from API 2. **Parse Reset Time**: Extract from headers or error body 3. **Mark Account**: Set `modelRateLimits[modelId].isRateLimited = true` 4. **Auto-Recovery**: Clear flag when `resetTime` expires ### Model-Specific Rate Limits Rate limits are tracked per model, per account: ```json theme={null} { "email": "user@gmail.com", "modelRateLimits": { "claude-opus-4-6-thinking": { "isRateLimited": true, "resetTime": 1709467800000, // Unix timestamp "lastError": "RESOURCE_EXHAUSTED" } } } ``` An account rate-limited on Opus can still be used for Sonnet or Gemini models. ## Monitoring Account Health ### Web Console Dashboard The Accounts tab shows real-time health data: * **Health Score**: Current health points (0-100) * **Token Bucket**: Available tokens / max tokens * **Quota Bars**: Per-model quota with threshold markers * **Rate Limit Status**: Active rate limits with countdown * **Last Used**: Timestamp of most recent request ### Health Inspector (Developer Mode) Enable Developer Mode to see detailed strategy metrics: 1. Settings → Developer Mode → Enable 2. Accounts tab → **Health Inspector** panel appears 3. Shows per-account: * Health score and history * Token bucket state * Failure/success counts * LRU timestamps ### API Endpoint ```bash theme={null} # Requires Developer Mode enabled curl http://localhost:8080/api/strategy/health ``` Returns: ```json theme={null} { "strategy": "hybrid", "accounts": [ { "email": "user@gmail.com", "health": 85, "tokens": 42, "maxTokens": 50, "lastUsed": 1709377800000, "failures": 2, "successes": 47 } ] } ``` ## CLI Management Reference Monitor and manage accounts via CLI: ```bash theme={null} # List all accounts with status antigravity-claude-proxy accounts list # Verify tokens are valid antigravity-claude-proxy accounts verify # Check quota and limits (table format) curl "http://localhost:8080/account-limits?format=table" # Interactive account menu antigravity-claude-proxy accounts ``` ## Best Practices Use **sticky strategy** to maximize prompt cache hits: ```bash theme={null} acc start --strategy=sticky ``` Set per-account quota thresholds to ensure you don't lose cache mid-conversation: ```json theme={null} { "quotaThreshold": 0.15 // Switch at 15% } ``` Use **round-robin** or **hybrid** strategy: ```bash theme={null} acc start --strategy=round-robin ``` Add multiple accounts to distribute load: ```bash theme={null} antigravity-claude-proxy accounts add ``` Use **hybrid strategy** (default) with quota protection: 1. Set global quota threshold: ```json theme={null} {"globalQuotaThreshold": 0.10} ``` 2. Monitor health via web console or API 3. Set up alerts for account failures 4. Add redundant accounts for failover Use **hybrid strategy** and configure per-model thresholds: * High threshold for expensive models (Opus): 25% * Low threshold for cheap models (Flash): 5% ```json theme={null} { "modelQuotaThresholds": { "claude-opus-4-6-thinking": 0.25, "gemini-3-flash": 0.05 } } ``` ## Troubleshooting Check reset times in web console. If all accounts are exhausted: 1. Wait for reset time (usually 1 hour) 2. Add more accounts to increase quota pool 3. Reduce request volume or use cheaper models Hybrid strategy will enter "last resort" mode with throttling delays. Set quota protection thresholds: 1. Global: Settings → Quota Protection 2. Per-account: Account settings modal 3. Per-model: Drag markers on quota bars Accounts will switch before quota runs out. If using **sticky strategy**: * Check if current account is rate-limited * Verify other accounts are enabled and valid * Strategy waits up to 2 minutes for short rate limits If using **hybrid strategy**: * Check health scores in Health Inspector * Verify token buckets aren't depleted * Check quota thresholds aren't excluding all accounts Health scores recover passively over time (+1 per 5 minutes). To reset: 1. Restart the proxy server: ```bash theme={null} acc restart ``` 2. Or wait for passive recovery 3. Successful requests give +5 points immediately ## Next Steps Add and configure Google accounts Monitor usage and health visually Explore supported Claude and Gemini models Programmatic access to proxy endpoints