Google Gemini Setup
AxonFlow supports Google's Gemini models for LLM routing and orchestration. Gemini provides multimodal capabilities with large context windows and competitive pricing.
Prerequisites
- Google Cloud account or AI Studio access
- API key from Google AI Studio
Quick Start
1. Get API Key
- Go to Google AI Studio
- Create or select a Google Cloud project
- Click "Create API Key"
- Copy the generated key
2. Configure AxonFlow
# Required
export GOOGLE_API_KEY=your-api-key-here
# Optional: Specify model (default: gemini-2.0-flash)
export GOOGLE_MODEL=gemini-2.5-flash
3. Start AxonFlow
docker compose up -d
Supported Models
| Model | Context Window | Best For |
|---|---|---|
gemini-2.5-flash | 1M tokens | Latest, fastest model |
gemini-2.5-pro | 2M tokens | Latest, highest quality |
gemini-2.0-flash | 1M tokens | Fast, general-purpose (default) |
gemini-2.0-flash-lite | 1M tokens | Cost-optimized, simple tasks |
gemini-1.5-pro | 2M tokens | Complex reasoning, long context (legacy) |
gemini-1.5-flash | 1M tokens | Balanced speed/quality (legacy) |
Configuration Options
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
GOOGLE_API_KEY | Yes | - | Google AI API key |
GOOGLE_MODEL | No | gemini-2.0-flash | Default model |
GOOGLE_ENDPOINT | No | https://generativelanguage.googleapis.com | API endpoint |
GOOGLE_TIMEOUT_SECONDS | No | 120 | Request timeout (seconds) |
YAML Configuration
For more control, use YAML configuration:
# axonflow.yaml
llm_providers:
gemini:
enabled: true
config:
model: gemini-2.0-flash
max_tokens: 8192
timeout: 120s
credentials:
api_key: ${GOOGLE_API_KEY}
priority: 8
weight: 0.3
Capabilities
The Gemini provider supports:
- Chat completions - Conversational AI
- Streaming responses - Real-time token streaming
- Long context - Up to 2M tokens (Gemini 2.5 Pro)
- Vision - Image understanding
- Function calling - Tool use
- Code generation - Programming assistance
Usage Examples
Proxy Mode (Python SDK)
Proxy mode routes requests through AxonFlow for simple integration:
from axonflow import AxonFlow
async with AxonFlow(agent_url="http://localhost:8080") as client:
# Execute query through AxonFlow (routes to configured Gemini provider)
response = await client.execute_query(
user_token="user-123",
query="Explain quantum computing",
request_type="chat",
context={"provider": "gemini", "model": "gemini-2.0-flash"}
)
print(response.content)
Proxy Mode (cURL)
curl -X POST http://localhost:8080/api/request \
-H "Content-Type: application/json" \
-H "X-User-Token: user-123" \
-d '{
"query": "What is machine learning?",
"provider": "gemini",
"model": "gemini-2.0-flash",
"max_tokens": 500
}'
Gateway Mode (TypeScript SDK)
Gateway mode gives you full control over the LLM call while AxonFlow handles policy enforcement and audit logging:
import { AxonFlow } from '@axonflow/sdk';
import { GoogleGenerativeAI } from '@google/generative-ai';
const axonflow = new AxonFlow({
endpoint: 'http://localhost:8080',
apiKey: 'your-axonflow-key'
});
// 1. Pre-check: Get policy approval
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: 'Explain quantum computing'
});
if (!ctx.approved) {
throw new Error(`Blocked: ${ctx.blockReason}`);
}
// 2. Call Gemini directly
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
const result = await model.generateContent(ctx.approvedData.query);
const response = result.response.text();
// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.substring(0, 100),
provider: 'gemini',
model: 'gemini-2.0-flash',
tokenUsage: { promptTokens: 50, completionTokens: 100, totalTokens: 150 },
latencyMs: 250
});
Streaming
Gemini supports server-sent events (SSE) for streaming responses. Use the Gemini SDK directly:
import { GoogleGenerativeAI } from '@google/generative-ai';
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
const result = await model.generateContentStream('Write a long story');
for await (const chunk of result.stream) {
const text = chunk.text();
process.stdout.write(text);
}
Pricing
Gemini pricing (as of December 2025):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 |
| Gemini 1.5 Pro (up to 128K) | $1.25 | $5.00 |
| Gemini 1.5 Pro (over 128K) | $2.50 | $10.00 |
| Gemini 1.5 Flash | $0.075 | $0.30 |
AxonFlow provides cost estimation via the /api/cost/estimate endpoint.
Multi-Provider Routing
Configure Gemini alongside other providers for intelligent routing:
llm_providers:
gemini:
enabled: true
config:
model: gemini-2.0-flash
credentials:
api_key: ${GOOGLE_API_KEY}
priority: 100
openai:
enabled: true
config:
model: gpt-4o
credentials:
api_key: ${OPENAI_API_KEY}
priority: 50
routing:
strategy: priority
fallback_enabled: true
Health Checks
The Gemini provider reports health status at:
curl http://localhost:8081/health
Response includes Gemini provider status:
{
"status": "healthy",
"providers": {
"gemini": {
"status": "healthy",
"latency_ms": 45
}
}
}
Error Handling
Common error codes from Gemini:
| Status | Reason | Action |
|---|---|---|
| 400 | Invalid request | Check request format |
| 401 | Invalid API key | Verify GOOGLE_API_KEY |
| 403 | Permission denied | Check API key permissions |
| 429 | Rate limit | Implement backoff/retry |
| 500 | Server error | Retry with exponential backoff |
AxonFlow automatically handles retries for transient errors (429, 500, 503).
Best Practices
- Use appropriate models - Gemini Flash for speed, Pro for quality
- Set reasonable timeouts - 120s default is good for most use cases
- Enable fallback providers - Configure OpenAI/Anthropic as backup
- Monitor costs - Use AxonFlow's cost dashboard to track usage
- Handle rate limits - Implement client-side retry logic for high-volume apps
Troubleshooting
"API key not valid"
- Verify the key at Google AI Studio
- Ensure the key has Generative Language API enabled
"Model not found"
- Check model name spelling (e.g.,
gemini-2.0-flash, notgemini-1.5-flash) - Verify model availability in your region
- Note: Gemini 1.5 models are being deprecated; use 2.0+ models
"Quota exceeded"
- Check usage at Google Cloud Console
- Consider upgrading to a paid plan
- Free tier has limited requests per minute
Next Steps
- LLM Providers Overview - All supported providers
- AWS Bedrock Setup - Enterprise cloud provider
- Ollama Setup - Self-hosted deployment
- Custom Provider SDK - Build custom providers