Skip to main content

Overview

Track costs and monitor usage for Google’s Gemini API by routing your requests through LLM Ops. This guide shows you how to integrate using Python, JavaScript, or cURL.
Security Guarantee: LLM Ops does NOT store your API keys, request prompts, or response content. We only track metadata needed for cost analytics.

Quick Start

Replace Google’s base URL with LLM Ops proxy URL to automatically track costs:
  • Original: https://generativelanguage.googleapis.com
  • LLM Ops: https://api.llm-ops.cloudidr.com

API Keys

You’ll need two credentials:
  1. Google API Key - Your Gemini API key from aistudio.google.com
  2. Cloudidr Token - Your tracking token from llmfinops.ai
Set them as environment variables:
export GOOGLE_API_KEY="AIzaSy..."
export CLOUDIDR_TOKEN="cloudidr_..."

Integration Examples

Install SDK

pip install google-generativeai

Basic Example

import google.generativeai as genai

# Configure with LLM Ops proxy
genai.configure(
    api_key="AIzaSy...",  # Your Google API key
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

# Create model
model = genai.GenerativeModel('gemini-2.0-flash-exp')

# Make API call with tracking headers
response = model.generate_content(
    "What is the capital of France?",
    request_options={
        "headers": {
            "X-Cloudidr-Token": "cloudidr_..."
        }
    }
)

print(response.text)

With Metadata (Department/Team/Agent Tracking)

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

# Track costs by department, team, and agent
response = model.generate_content(
    "Explain quantum computing in simple terms",
    request_options={
        "headers": {
            "X-Cloudidr-Token": "cloudidr_...",
            "X-Department": "research",
            "X-Team": "ml",
            "X-Agent": "science-explainer"
        }
    }
)

print(response.text)

Streaming Example

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

# Streaming is fully supported
response = model.generate_content(
    "Write a story about a robot learning to paint",
    stream=True,
    request_options={
        "headers": {
            "X-Cloudidr-Token": "cloudidr_...",
            "X-Agent": "story-generator"
        }
    }
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Chat Example (Multi-turn)

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')
chat = model.start_chat(history=[])

headers = {
    "X-Cloudidr-Token": "cloudidr_...",
    "X-Agent": "chat-bot"
}

# All messages are tracked with costs
response1 = chat.send_message(
    "Hello! What's your name?",
    request_options={"headers": headers}
)
print(response1.text)

response2 = chat.send_message(
    "Can you help me with Python?",
    request_options={"headers": headers}
)
print(response2.text)

Multimodal Example (Image Analysis)

import google.generativeai as genai
from PIL import Image

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

# Analyze an image - costs are tracked
img = Image.open('photo.jpg')
response = model.generate_content(
    ["What's in this image?", img],
    request_options={
        "headers": {
            "X-Cloudidr-Token": "cloudidr_...",
            "X-Agent": "vision-analyzer"
        }
    }
)

print(response.text)

Cost Tracking Headers

Add these headers to organize your costs by department, team, or agent:
HeaderDescriptionExample
X-Cloudidr-TokenRequired - Your CloudIDR tracking tokencloudidr_abc123...
X-DepartmentTrack costs by departmentengineering, sales, marketing, support
X-TeamTrack costs by teambackend, frontend, ml, data, qa
X-AgentTrack costs by agent/applicationchatbot, summarizer, analyzer, translator

Supported Models

All Google Gemini models are supported. See the Supported Models page for the complete list of available models and pricing.

What Gets Tracked

LLM Ops automatically captures: Token usage - Input and output tokens
Cost - Real-time cost calculation
Latency - Request duration
Model - Which Gemini model was used
Metadata - Department, team, agent
Errors - Failed requests and error types
Safety ratings - Content safety scores
Multimodal inputs - Image/video/audio tokens (included in input_tokens)
What We DON’T Track:
  • ❌ Customer API keys
  • ❌ Request content (prompts)
  • ❌ Response content (completions)
  • ❌ Image/video/audio file contents
  • ❌ Separate multimodal token breakdown (all included in input_tokens)
We only track metadata needed for cost analytics.

View Your Data

After making requests, view your costs in the LLM Ops Dashboard:
  • Agent Explorer - See costs by agent/application
  • Department Breakdown - Compare department spending
  • Team Analysis - Track team-level costs
  • Model Comparison - Compare costs across Gemini models
  • Time Series - Track spending over time
  • Safety Analytics - Monitor content safety scores

Migration from Direct API

Switching from direct Gemini API to LLM Ops requires updating the endpoint:
# Before
genai.configure(api_key="AIzaSy...")

# After - add proxy endpoint and headers
genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"  # ← Add this
    }
)

# Then add headers to each request
response = model.generate_content(
    "Your prompt",
    request_options={
        "headers": {"X-Cloudidr-Token": "cloudidr_..."}  # ← Add this
    }
)

Multimodal Support

Gemini supports images, video, and audio - all tracked by LLM Ops:
import google.generativeai as genai
from PIL import Image

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

# Analyze an image
img = Image.open('photo.jpg')
response = model.generate_content(
    ["What's in this image?", img],
    request_options={
        "headers": {
            "X-Cloudidr-Token": "cloudidr_...",
            "X-Agent": "vision-analyzer"
        }
    }
)

print(response.text)
Multimodal Token Tracking:Google converts images/video/audio to tokens and includes them in the input_tokens count. LLM Ops tracks the total input tokens returned by Google.Image/video/audio tokens are NOT tracked separately - they’re included in the total input_tokens count.

Cost Optimization Tips

Images and videos consume significant tokens:
  • Track total input token usage in dashboard
  • Identify agents with high token consumption
  • Optimize image resolution before sending to API
  • Use lower resolution for simple tasks
LLM Ops tracks total input tokens (text + multimodal combined).
Gemini supports massive context windows (up to 2M tokens):
  • Process entire documents in one call
  • Reduce chunking overhead
  • Maintain full context for better results
Can reduce total API calls needed, lowering costs.
Use dashboard to identify cost-saving opportunities:
  • Track performance vs. cost by model
  • Test different Gemini variants
  • Switch high-volume tasks to cheaper models
Perfect for finding the right model-task match.

Troubleshooting

Check these common issues:
  • ✅ Verify base URL is https://api.llm-ops.cloudidr.com
  • ✅ Confirm X-Cloudidr-Token header is included in all requests
  • ✅ Check that your Google API key is valid
  • ✅ Ensure you’re using /v1beta in the endpoint path for Gemini
Two separate keys are needed:
  • Your Google API key (for Gemini access)
  • Your CloudIDR token (for cost tracking)
Make sure both are set correctly and not swapped.
Wait a few moments:
  • Cost data may take 10-30 seconds to appear in dashboard
  • Check the correct time range in dashboard filters
  • Verify requests are returning 200 OK status

Next Steps