Skip to main content

Google Gemini Setup

AxonFlow supports Google's Gemini models for LLM routing and orchestration. Gemini provides multimodal capabilities with large context windows and competitive pricing.

Prerequisites

Quick Start

1. Get API Key

  1. Go to Google AI Studio
  2. Create or select a Google Cloud project
  3. Click "Create API Key"
  4. Copy the generated key

2. Configure AxonFlow

# Required
export GOOGLE_API_KEY=your-api-key-here

# Optional: Specify model (default: gemini-2.0-flash)
export GOOGLE_MODEL=gemini-2.5-flash

3. Start AxonFlow

docker compose up -d

Supported Models

ModelContext WindowBest For
gemini-2.5-flash1M tokensLatest, fastest model
gemini-2.5-pro2M tokensLatest, highest quality
gemini-2.0-flash1M tokensFast, general-purpose (default)
gemini-2.0-flash-lite1M tokensCost-optimized, simple tasks
gemini-1.5-pro2M tokensComplex reasoning, long context (legacy)
gemini-1.5-flash1M tokensBalanced speed/quality (legacy)

Configuration Options

Environment Variables

VariableRequiredDefaultDescription
GOOGLE_API_KEYYes-Google AI API key
GOOGLE_MODELNogemini-2.0-flashDefault model
GOOGLE_ENDPOINTNohttps://generativelanguage.googleapis.comAPI endpoint
GOOGLE_TIMEOUT_SECONDSNo120Request timeout (seconds)

YAML Configuration

For more control, use YAML configuration:

# axonflow.yaml
llm_providers:
gemini:
enabled: true
config:
model: gemini-2.0-flash
max_tokens: 8192
timeout: 120s
credentials:
api_key: ${GOOGLE_API_KEY}
priority: 8
weight: 0.3

Capabilities

The Gemini provider supports:

  • Chat completions - Conversational AI
  • Streaming responses - Real-time token streaming
  • Long context - Up to 2M tokens (Gemini 2.5 Pro)
  • Vision - Image understanding
  • Function calling - Tool use
  • Code generation - Programming assistance

Usage Examples

Proxy Mode (Python SDK)

Proxy mode routes requests through AxonFlow for simple integration:

from axonflow import AxonFlow

async with AxonFlow(agent_url="http://localhost:8080") as client:
# Execute query through AxonFlow (routes to configured Gemini provider)
response = await client.execute_query(
user_token="user-123",
query="Explain quantum computing",
request_type="chat",
context={"provider": "gemini", "model": "gemini-2.0-flash"}
)
print(response.content)

Proxy Mode (cURL)

curl -X POST http://localhost:8080/api/request \
-H "Content-Type: application/json" \
-H "X-User-Token: user-123" \
-d '{
"query": "What is machine learning?",
"provider": "gemini",
"model": "gemini-2.0-flash",
"max_tokens": 500
}'

Gateway Mode (TypeScript SDK)

Gateway mode gives you full control over the LLM call while AxonFlow handles policy enforcement and audit logging:

import { AxonFlow } from '@axonflow/sdk';
import { GoogleGenerativeAI } from '@google/generative-ai';

const axonflow = new AxonFlow({
endpoint: 'http://localhost:8080',
apiKey: 'your-axonflow-key'
});

// 1. Pre-check: Get policy approval
const ctx = await axonflow.getPolicyApprovedContext({
userToken: 'user-123',
query: 'Explain quantum computing'
});

if (!ctx.approved) {
throw new Error(`Blocked: ${ctx.blockReason}`);
}

// 2. Call Gemini directly
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
const result = await model.generateContent(ctx.approvedData.query);
const response = result.response.text();

// 3. Audit the call
await axonflow.auditLLMCall({
contextId: ctx.contextId,
responseSummary: response.substring(0, 100),
provider: 'gemini',
model: 'gemini-2.0-flash',
tokenUsage: { promptTokens: 50, completionTokens: 100, totalTokens: 150 },
latencyMs: 250
});

Streaming

Gemini supports server-sent events (SSE) for streaming responses. Use the Gemini SDK directly:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });

const result = await model.generateContentStream('Write a long story');

for await (const chunk of result.stream) {
const text = chunk.text();
process.stdout.write(text);
}

Pricing

Gemini pricing (as of December 2025):

ModelInput (per 1M tokens)Output (per 1M tokens)
Gemini 2.0 Flash$0.10$0.40
Gemini 1.5 Pro (up to 128K)$1.25$5.00
Gemini 1.5 Pro (over 128K)$2.50$10.00
Gemini 1.5 Flash$0.075$0.30

AxonFlow provides cost estimation via the /api/cost/estimate endpoint.

Multi-Provider Routing

Configure Gemini alongside other providers for intelligent routing:

llm_providers:
gemini:
enabled: true
config:
model: gemini-2.0-flash
credentials:
api_key: ${GOOGLE_API_KEY}
priority: 100

openai:
enabled: true
config:
model: gpt-4o
credentials:
api_key: ${OPENAI_API_KEY}
priority: 50

routing:
strategy: priority
fallback_enabled: true

Health Checks

The Gemini provider reports health status at:

curl http://localhost:8081/health

Response includes Gemini provider status:

{
"status": "healthy",
"providers": {
"gemini": {
"status": "healthy",
"latency_ms": 45
}
}
}

Error Handling

Common error codes from Gemini:

StatusReasonAction
400Invalid requestCheck request format
401Invalid API keyVerify GOOGLE_API_KEY
403Permission deniedCheck API key permissions
429Rate limitImplement backoff/retry
500Server errorRetry with exponential backoff

AxonFlow automatically handles retries for transient errors (429, 500, 503).

Best Practices

  1. Use appropriate models - Gemini Flash for speed, Pro for quality
  2. Set reasonable timeouts - 120s default is good for most use cases
  3. Enable fallback providers - Configure OpenAI/Anthropic as backup
  4. Monitor costs - Use AxonFlow's cost dashboard to track usage
  5. Handle rate limits - Implement client-side retry logic for high-volume apps

Troubleshooting

"API key not valid"

  • Verify the key at Google AI Studio
  • Ensure the key has Generative Language API enabled

"Model not found"

  • Check model name spelling (e.g., gemini-2.0-flash, not gemini-1.5-flash)
  • Verify model availability in your region
  • Note: Gemini 1.5 models are being deprecated; use 2.0+ models

"Quota exceeded"

  • Check usage at Google Cloud Console
  • Consider upgrading to a paid plan
  • Free tier has limited requests per minute

Next Steps