Skip to main content

Overview

Track costs and monitor usage for Google’s Gemini API by routing your requests through LLM Ops. This guide shows you how to integrate using Python, JavaScript, or cURL.
Security Guarantee: LLM Ops does not store your API keys, request prompts, or response content in the analytics database—only metadata needed for cost analytics. The proxy must forward request bodies to Google to complete the call; optional operational logging may exist in your deployment environment.

Quick Start

Point the Gemini client at the LLM Ops API host (same path layout as Google: /v1beta/models/...). The proxy serves:
  • Original API host: https://generativelanguage.googleapis.com
  • LLM Ops API host: https://api.llm-ops.cloudidr.com (paths such as /v1beta/models/{model}:generateContent stay the same; only the host changes)

API Keys

You’ll need two credentials:
  1. Google API Key - Your Gemini API key from aistudio.google.com (or your Google Cloud project)
  2. Cloudidr Key - Your tracking token from the LLM Ops dashboard (tokens are typically prefixed with trk_)
The marketing site llmfinops.ai points at the same product; the dashboard URL above is the canonical app host. Set them as environment variables:
export GOOGLE_API_KEY="AIzaSy..."
export CLOUDIDR_KEY="trk_..."

Integration Examples

Install SDK

pip install google-generativeai

Basic Example

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",  # Your Google API key
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

response = model.generate_content(
    "What is the capital of France?",
    request_options={
        "headers": {
            "X-Cloudidr-Key": "trk_..."
        }
    }
)

print(response.text)

With Metadata (Department/Team/Agent Tracking)

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

# X-Project is preferred for team/project; X-Team is a legacy alias
response = model.generate_content(
    "Explain quantum computing in simple terms",
    request_options={
        "headers": {
            "X-Cloudidr-Key": "trk_...",
            "X-Department": "research",
            "X-Team": "ml",
            "X-Agent": "science-explainer"
        }
    }
)

print(response.text)

Streaming Example

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

response = model.generate_content(
    "Write a story about a robot learning to paint",
    stream=True,
    request_options={
        "headers": {
            "X-Cloudidr-Key": "trk_...",
            "X-Agent": "story-generator"
        }
    }
)

for chunk in response:
    print(chunk.text, end="", flush=True)

Chat Example (Multi-turn)

import google.generativeai as genai

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')
chat = model.start_chat(history=[])

headers = {
    "X-Cloudidr-Key": "trk_...",
    "X-Agent": "chat-bot"
}

response1 = chat.send_message(
    "Hello! What's your name?",
    request_options={"headers": headers}
)
print(response1.text)

response2 = chat.send_message(
    "Can you help me with Python?",
    request_options={"headers": headers}
)
print(response2.text)

Multimodal Example (Image Analysis)

import google.generativeai as genai
from PIL import Image

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

img = Image.open('photo.jpg')
response = model.generate_content(
    ["What's in this image?", img],
    request_options={
        "headers": {
            "X-Cloudidr-Key": "trk_...",
            "X-Agent": "vision-analyzer"
        }
    }
)

print(response.text)

Cost Tracking Headers

HeaderDescriptionExample
X-Cloudidr-KeyRequired - Your Cloudidr tracking tokentrk_abc123...
X-DepartmentTrack costs by departmentengineering, sales, marketing, support
X-ProjectTrack costs by project/team (preferred)backend, frontend, ml, data, qa
X-TeamLegacy alias for project/team (same as X-Project)backend, frontend
X-AgentTrack costs by agent/applicationchatbot, summarizer, analyzer, translator

Supported Models

All Google Gemini models supported by the proxy are available. See the Supported Models page for the complete list of available models and pricing.

What Gets Tracked

LLM Ops automatically captures: Token usage - Input and output tokens (including multimodal input counted toward input tokens)
Cost - Real-time cost calculation
Latency - Request duration
Model - Which Gemini model was used
Metadata - Department, team, agent
Errors - Failed requests and error types
Multimodal inputs - Media you send affects token usage; totals appear in input tokens from Google’s usage metadata
Google may report hidden or thoughts tokens (context, safety, etc.) in usage; LLM Ops uses those counts for billing alignment where present.
What We DON’T Track:
  • ❌ Customer API keys
  • ❌ Request content (prompts)
  • ❌ Response content (completions)
  • ❌ Raw image/video/audio bytes in our analytics database
We only persist metadata needed for cost analytics in our application database. We do not store full Gemini safety rating objects as separate dashboard fields unless your product explicitly adds that—token-based usage is the primary signal in the proxy.

View Your Data

After making requests, view your costs in the LLM Ops Dashboard:
  • Agent Explorer - See costs by agent/application
  • Department Breakdown - Compare department spending
  • Team Analysis - Track team-level costs
  • Model Comparison - Compare costs across Gemini models
  • Time Series - Track spending over time

Migration from Direct API

Switching from direct Gemini API to LLM Ops requires updating the endpoint and adding the tracking header on each request:
# Before
genai.configure(api_key="AIzaSy...")

# After - point REST transport at the proxy host
genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

response = model.generate_content(
    "Your prompt",
    request_options={
        "headers": {"X-Cloudidr-Key": "trk_..."}
    }
)

Multimodal Support

Gemini supports images, video, and audio—all requests go through the same proxy and are billed from Google’s usage metadata:
import google.generativeai as genai
from PIL import Image

genai.configure(
    api_key="AIzaSy...",
    transport="rest",
    client_options={
        "api_endpoint": "https://api.llm-ops.cloudidr.com"
    }
)

model = genai.GenerativeModel('gemini-2.0-flash-exp')

img = Image.open('photo.jpg')
response = model.generate_content(
    ["What's in this image?", img],
    request_options={
        "headers": {
            "X-Cloudidr-Key": "trk_...",
            "X-Agent": "vision-analyzer"
        }
    }
)

print(response.text)
Multimodal token tracking:Google converts images/video/audio to tokens and includes them in usage. LLM Ops records total input/output tokens and cost—typically not a separate line item per modality in the database.

Cost Optimization Tips

Images and videos can consume significant tokens:
  • Track total input token usage in dashboard
  • Identify agents with high token consumption
  • Optimize image resolution before sending to API
LLM Ops tracks total input tokens (text + multimodal combined in usage).
Many Gemini models support large context windows (limits vary by model):
  • Process large documents in fewer calls when appropriate
  • Balance context size vs. token cost
Fewer round trips can reduce overhead; very large contexts still incur proportional token cost.
Use the dashboard to find cost-saving opportunities:
  • Track performance vs. cost by model
  • Test different Gemini variants for your workload
  • Move high-volume, low-complexity tasks to cheaper models where quality allows

Troubleshooting

Check these common issues:
  • ✅ Use API host https://api.llm-ops.cloudidr.com with Gemini paths (/v1beta/models/...)—same structure as generativelanguage.googleapis.com.
  • ✅ Confirm the header name is X-Cloudidr-Key (not X-Cloudidr-Token) on every request.
  • ✅ Pass your Google API key as Google expects (?key=, x-goog-api-key, or Authorization, depending on client).
  • ✅ Verify your Cloudidr tracking token is valid.
Two separate keys are needed:
  • Your Google API key (for Gemini access)
  • Your Cloudidr tracking token (for cost tracking)
Make sure both are set correctly and not swapped.
Wait a few moments:
  • Cost data may take 10-30 seconds to appear in dashboard
  • Check the correct time range in dashboard filters
  • Verify requests are returning 200 OK status

Next Steps

View Dashboard

See your Gemini API costs in real-time

Supported Models

View all supported Gemini models

OpenAI Integration

Add cost tracking for GPT models

Set Budgets

Configure spending alerts and limits