Google Gemini Setup

AxonFlow supports Google's Gemini models for LLM routing and orchestration. Gemini provides multimodal capabilities with large context windows and competitive pricing.

Prerequisites

Google Cloud account or AI Studio access
API key from Google AI Studio

Quick Start

1. Get API Key

Go to Google AI Studio
Create or select a Google Cloud project
Click "Create API Key"
Copy the generated key

2. Configure AxonFlow

# Required
export GOOGLE_API_KEY=your-api-key-here

# Optional: Specify model (default: gemini-2.0-flash)
export GOOGLE_MODEL=gemini-2.5-flash

3. Start AxonFlow

docker compose up -d

Supported Models

Model	Context Window	Best For
`gemini-2.5-flash`	1M tokens	Latest, fastest model
`gemini-2.5-pro`	2M tokens	Latest, highest quality
`gemini-2.0-flash`	1M tokens	Fast, general-purpose (default)
`gemini-2.0-flash-lite`	1M tokens	Cost-optimized, simple tasks
`gemini-1.5-pro`	2M tokens	Complex reasoning, long context (legacy)
`gemini-1.5-flash`	1M tokens	Balanced speed/quality (legacy)

Configuration Options

Environment Variables

Variable	Required	Default	Description
`GOOGLE_API_KEY`	Yes	-	Google AI API key
`GOOGLE_MODEL`	No	`gemini-2.0-flash`	Default model
`GOOGLE_ENDPOINT`	No	`https://generativelanguage.googleapis.com`	API endpoint
`GOOGLE_TIMEOUT_SECONDS`	No	`120`	Request timeout (seconds)

YAML Configuration

For more control, use YAML configuration:

# axonflow.yaml
llm_providers:
  gemini:
    enabled: true
    config:
      model: gemini-2.0-flash
      max_tokens: 8192
      timeout: 120s
    credentials:
      api_key: ${GOOGLE_API_KEY}
    priority: 8
    weight: 0.3

Capabilities

The Gemini provider supports:

Chat completions - Conversational AI
Streaming responses - Real-time token streaming
Long context - Up to 2M tokens (Gemini 2.5 Pro)
Vision - Image understanding
Function calling - Tool use
Code generation - Programming assistance

Usage Examples

Proxy Mode (Python SDK)

Proxy mode routes requests through AxonFlow for simple integration:

from axonflow import AxonFlow

async with AxonFlow(agent_url="http://localhost:8080") as client:
    # Execute query through AxonFlow (routes to configured Gemini provider)
    response = await client.execute_query(
        user_token="user-123",
        query="Explain quantum computing",
        request_type="chat",
        context={"provider": "gemini", "model": "gemini-2.0-flash"}
    )
    print(response.content)

Proxy Mode (cURL)

curl -X POST http://localhost:8080/api/request \
  -H "Content-Type: application/json" \
  -H "X-User-Token: user-123" \
  -d '{
    "query": "What is machine learning?",
    "provider": "gemini",
    "model": "gemini-2.0-flash",
    "max_tokens": 500
  }'

Gateway Mode (TypeScript SDK)

Gateway mode gives you full control over the LLM call while AxonFlow handles policy enforcement and audit logging:

import { AxonFlow } from '@axonflow/sdk';
import { GoogleGenerativeAI } from '@google/generative-ai';

const axonflow = new AxonFlow({
  endpoint: 'http://localhost:8080',
  apiKey: 'your-axonflow-key'
});

// 1. Pre-check: Get policy approval
const ctx = await axonflow.getPolicyApprovedContext({
  userToken: 'user-123',
  query: 'Explain quantum computing'
});

if (!ctx.approved) {
  throw new Error(`Blocked: ${ctx.blockReason}`);
}

// 2. Call Gemini directly
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
const result = await model.generateContent(ctx.approvedData.query);
const response = result.response.text();

// 3. Audit the call
await axonflow.auditLLMCall({
  contextId: ctx.contextId,
  responseSummary: response.substring(0, 100),
  provider: 'gemini',
  model: 'gemini-2.0-flash',
  tokenUsage: { promptTokens: 50, completionTokens: 100, totalTokens: 150 },
  latencyMs: 250
});

Streaming

Gemini supports server-sent events (SSE) for streaming responses. Use the Gemini SDK directly:

import { GoogleGenerativeAI } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });

const result = await model.generateContentStream('Write a long story');

for await (const chunk of result.stream) {
  const text = chunk.text();
  process.stdout.write(text);
}

Pricing

Gemini pricing (as of December 2025):

Model	Input (per 1M tokens)	Output (per 1M tokens)
Gemini 2.0 Flash	$0.10	$0.40
Gemini 1.5 Pro (up to 128K)	$1.25	$5.00
Gemini 1.5 Pro (over 128K)	$2.50	$10.00
Gemini 1.5 Flash	$0.075	$0.30

AxonFlow provides cost estimation via the /api/cost/estimate endpoint.

Multi-Provider Routing

Configure Gemini alongside other providers for intelligent routing:

llm_providers:
  gemini:
    enabled: true
    config:
      model: gemini-2.0-flash
    credentials:
      api_key: ${GOOGLE_API_KEY}
    priority: 100

  openai:
    enabled: true
    config:
      model: gpt-4o
    credentials:
      api_key: ${OPENAI_API_KEY}
    priority: 50

routing:
  strategy: priority
  fallback_enabled: true

Health Checks

The Gemini provider reports health status at:

curl http://localhost:8081/health

Response includes Gemini provider status:

{
  "status": "healthy",
  "providers": {
    "gemini": {
      "status": "healthy",
      "latency_ms": 45
    }
  }
}

Error Handling

Common error codes from Gemini:

Status	Reason	Action
400	Invalid request	Check request format
401	Invalid API key	Verify `GOOGLE_API_KEY`
403	Permission denied	Check API key permissions
429	Rate limit	Implement backoff/retry
500	Server error	Retry with exponential backoff

AxonFlow automatically handles retries for transient errors (429, 500, 503).

Best Practices

Use appropriate models - Gemini Flash for speed, Pro for quality
Set reasonable timeouts - 120s default is good for most use cases
Enable fallback providers - Configure OpenAI/Anthropic as backup
Monitor costs - Use AxonFlow's cost dashboard to track usage
Handle rate limits - Implement client-side retry logic for high-volume apps

Troubleshooting

"API key not valid"

Verify the key at Google AI Studio
Ensure the key has Generative Language API enabled

"Model not found"

Check model name spelling (e.g., gemini-2.0-flash, not gemini-1.5-flash)
Verify model availability in your region
Note: Gemini 1.5 models are being deprecated; use 2.0+ models

"Quota exceeded"

Check usage at Google Cloud Console
Consider upgrading to a paid plan
Free tier has limited requests per minute

Next Steps

LLM Providers Overview - All supported providers
AWS Bedrock Setup - Enterprise cloud provider
Ollama Setup - Self-hosted deployment
Custom Provider SDK - Build custom providers

Prerequisites​

Quick Start​

1. Get API Key​

2. Configure AxonFlow​

3. Start AxonFlow​

Supported Models​

Configuration Options​

Environment Variables​

YAML Configuration​

Capabilities​

Usage Examples​

Proxy Mode (Python SDK)​

Proxy Mode (cURL)​

Gateway Mode (TypeScript SDK)​

Streaming​

Pricing​

Multi-Provider Routing​

Health Checks​

Error Handling​

Best Practices​

Troubleshooting​

"API key not valid"​

"Model not found"​

"Quota exceeded"​

Next Steps​