Skip to main content

Overview

These are the models supported by CloudIDR LLM Ops to track API costs. The model pricing is from the providers which we use to calculate your spend.
Last Updated: January 10, 2026Model pricing is subject to change by the providers. We update our pricing regularly to ensure accurate cost tracking.
Model Not Listed?If your model is not in this list, please contact us at [email protected] and we’ll add support for it.

Pricing Tables

Anthropic Claude Models

All pricing is per 1 million tokens.
ModelInput CostOutput Cost
Claude Opus 4.5
claude-opus-4-5-20251101$5.00$25.00
Claude Opus 4.1
claude-opus-4-1-20250805$15.00$75.00
Claude Opus 4
claude-opus-4-20250514$15.00$75.00
Claude Sonnet 4.5
claude-sonnet-4-5-20250929$3.00$15.00
Claude Sonnet 4
claude-sonnet-4-20250514$3.00$15.00
Claude Haiku 4.5
claude-haiku-4-5-20251001$1.00$5.00
Claude 3.5 Haiku
claude-3-5-haiku-20241022$0.80$4.00
Claude 3 Haiku
claude-3-haiku-20240307$0.25$1.25
Model Recommendations:
  • Opus - Most capable, best for complex reasoning
  • Sonnet - Balanced performance and cost
  • Haiku - Fastest and most affordable

Integration Guide

See the Anthropic Integration page to start tracking costs.

Cost Comparison

Best for high-volume, simple tasks:
  • GPT-5 Nano: 0.05input/0.05 input / 0.40 output
  • Gemini 2.0 Flash Lite: 0.075input/0.075 input / 0.30 output
  • Gemini 1.5 Flash: 0.075input/0.075 input / 0.30 output
  • GPT-4.1 Nano: 0.10input/0.10 input / 0.40 output
  • Gemini 2.5 Flash Lite: 0.10input/0.10 input / 0.40 output
  • GPT-4o Mini: 0.15input/0.15 input / 0.60 output
  • Claude 3 Haiku: 0.25input/0.25 input / 1.25 output
  • Claude 3.5 Haiku: 0.80input/0.80 input / 4.00 output
Perfect for: Classification, extraction, simple Q&A, high-throughput tasks
Balanced performance and cost:
  • Claude Haiku 4.5: 1.00input/1.00 input / 5.00 output
  • GPT-4.1: 2.00input/2.00 input / 8.00 output
  • GPT-4o: 2.50input/2.50 input / 10.00 output
  • Claude Sonnet 4.5: 3.00input/3.00 input / 15.00 output
  • Claude Opus 4.5: 5.00input/5.00 input / 25.00 output
Perfect for: Customer support, content generation, code assistance
Advanced reasoning and complex tasks:
  • Claude Opus 4: 15.00input/15.00 input / 75.00 output
  • o1: 15.00input/15.00 input / 60.00 output
  • o3: 2.00input/2.00 input / 8.00 output
  • o1-pro: 150.00input/150.00 input / 600.00 output
  • GPT-5 Pro: 15.00input/15.00 input / 120.00 output
  • GPT-4 (legacy): 30.00input/30.00 input / 60.00 output
Perfect for: Complex reasoning, research, code generation, expert analysis
AI image generation models:
  • Imagen 4 Fast: $0.02 per image
  • DALL-E 3 Standard (1024×1024): $0.040 per image
  • Imagen 4 Standard: $0.04 per image
  • gemini-2.5-flash-image: $0.039 per image (+ token costs)
  • Imagen 4 Ultra: $0.06 per image
  • DALL-E 3 HD (1024×1024): $0.080 per image
  • DALL-E 3 Large (1792×1024): 0.0800.080-0.120 per image
Perfect for: Marketing materials, product images, illustrations
AI video generation models:
  • Veo 3.0/3.1 Fast: $0.15 per second
  • Veo 3.0/3.1 Standard: $0.40 per second
Perfect for: Marketing videos, product demos, content creation
Audio transcription models:
  • Whisper-1: 0.006perminute(0.006 per minute (0.0001/second)
Text-to-speech models:
  • TTS-1 (Standard): $15.00 per 1M characters
  • TTS-1-HD (High Definition): $30.00 per 1M characters
Perfect for: Transcription services, voice assistants, audiobooks, accessibility

How Pricing Works

Token Calculation

LLM Ops tracks both input and output tokens separately:
  • Input tokens = Your prompt + any system messages + conversation history
  • Output tokens = The model’s response
Example:
Input: "Write a haiku about AI" (6 tokens)
Output: "Silicon dreams flow / Algorithms learn and grow / Future unfolds now" (15 tokens)

Cost with GPT-4o:
Input: 6 tokens × $2.50/1M = $0.000015
Output: 15 tokens × $10.00/1M = $0.00015
Total: $0.000165

Cost Calculation

Your total cost is calculated as:
Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)
All costs are tracked in real-time and displayed in your LLM Ops Dashboard.

Image & Video Generation Pricing

Image Generation Models

Models like DALL-E 3, Imagen 4, and gemini-2.5-flash-image generate images and are priced differently: DALL-E 3 (Per-Image):
Cost = Number of Images × Price Per Image

Example (Standard 1024×1024):
Generate 5 images with dall-e-3
Cost = 5 images × $0.040 = $0.20

Example (HD 1024×1792):
Generate 3 images with dall-e-3 HD quality
Cost = 3 images × $0.120 = $0.36
Imagen 4 Models (Per-Image):
Cost = Number of Images × Price Per Image

Example:
Generate 5 images with imagen-4.0-fast-generate-001
Cost = 5 images × $0.02 = $0.10
gemini-2.5-flash-image (Token-Based):
Text Input Cost = Input Tokens × $0.30/1M
Text Output Cost = Text Output Tokens × $2.50/1M
Image Output Cost = Image Output Tokens × $30.00/1M

Example:
Input: "Generate a sunset image" (100 tokens)
Output: Text description (50 tokens) + Image (1,290 tokens)

Text Input: 100 × $0.30/1M = $0.00003
Text Output: 50 × $2.50/1M = $0.000125
Image Output: 1,290 × $30.00/1M = $0.0387
Total: $0.038855

Note: Each image consumes approximately 1,290 tokens
This equals ~$0.039 per image

Video Generation Models

Veo Models (Per-Second):
Cost = Video Duration (seconds) × Price Per Second

Example:
Generate 10-second video with veo-3.0-fast-generate-001
Cost = 10 seconds × $0.15 = $1.50

Audio Transcription Models

Whisper (Per-Second):
Cost = Audio Duration (seconds) × Price Per Second

Example:
Transcribe 125.5-second audio file with whisper-1
Cost = 125.5 seconds × $0.0001 = $0.01255

Note: $0.0001/second = $0.006/minute

Text-to-Speech Models

TTS Models (Per-Character):
Cost = (Characters / 1,000,000) × Price Per Million Characters

Example (Standard):
Generate speech from 1,250 characters with tts-1
Cost = (1,250 / 1,000,000) × $15.00 = $0.01875

Example (HD):
Generate speech from 1,250 characters with tts-1-hd
Cost = (1,250 / 1,000,000) × $30.00 = $0.0375
Provider Counting:Image, video, and audio generation costs are calculated based on what the provider reports:
  • Images (DALL-E 3, Imagen 4): Provider returns number of images generated
  • Videos (Veo): Provider returns video duration in seconds
  • Audio (Whisper): Provider returns audio duration in seconds
  • TTS (TTS-1, TTS-1-HD): Provider returns character count of input text
We trust the provider’s counts and multiply by our pricing table.

Multimodal Pricing

Two Types of Multimodal Models

1. Multimodal Understanding Models (Token-Based)
  • These models analyze images, videos, and audio
  • Input media is converted to tokens by the provider
  • Charged per token (text + media tokens combined)
  • Examples: GPT-4o, Claude Opus, Gemini 2.5 Flash
2. Media Generation Models (Per-Unit)
  • These models create images, videos, or audio
  • DALL-E 3, Imagen 4: Charged per image generated
  • Veo 3: Charged per second of video generated
  • Whisper: Charged per second of audio transcribed
  • TTS: Charged per character of text input
  • gemini-2.5-flash-image: Hybrid (token-based, but image output uses premium rate)
How Multimodal Tokens Are Tracked:For understanding models, providers (OpenAI, Anthropic, Google) automatically convert images, video, and audio into tokens and include them in the response. LLM Ops tracks the total token count returned by the provider.Image/video/audio input tokens are included in input_tokens - they are not tracked separately.For generation models, we track based on what the provider charges:
  • Text tokens: Standard input/output pricing
  • Images generated: Per-image or per-token (depending on model)
  • Video generated: Per-second of video
  • Audio transcribed: Per-second of audio
  • TTS generated: Per-character of input text

Multimodal Understanding Models (Token-Based)

These models analyze images, video, and audio sent as input:
Models with image understanding:
  • Gemini 2.0/2.5 Flash
  • GPT-4o
  • GPT-4o Mini
  • Claude Opus 4/4.5
  • Claude Sonnet 4/4.5
How it works:
  1. You send an image with your prompt
  2. Provider converts image to tokens based on resolution
  3. Provider returns total input_tokens (text + image)
  4. LLM Ops tracks the total as input tokens
  5. Cost = input_tokens × input_price
Note: Higher resolution images = more input tokens = higher cost

Pricing Breakdown Summary

Here’s how different types of content are charged:
Content TypeUnderstanding (Input)Generation (Output)Examples
TextPer tokenPer tokenAll models
Images (Input)Converted to tokens by provider
Included in input_tokens
N/AGPT-4o, Claude Opus, Gemini Flash
Audio (Input)Converted to tokens by provider
Included in input_tokens
Some models charge premium rate
N/AGPT-4o Audio, Gemini Flash
(Audio input: 1.00/1Mvs1.00/1M vs 0.30/1M text)
Audio TranscriptionPer second (0.0001/sec=0.0001/sec = 0.006/min)Text output (per token)whisper-1
Video (Input)Converted to tokens by provider
Included in input_tokens
N/AGemini 2.5 Flash
Images (Output)N/ADALL-E 3: Per image (0.0400.040-0.120)
Imagen 4: Per image (0.020.02-0.06)
gemini-2.5-flash-image: Per token (30/1M= 30/1M = ~0.039/image)
dall-e-3, imagen-4.0-fast
gemini-2.5-flash-image
Video (Output)N/APer second (0.150.15-0.40/sec)veo-3.0-fast, veo-3.1
Audio (Output - TTS)N/AOpenAI TTS: Per character (1515-30/1M)
Gemini TTS: Per token (1010-20/1M)
tts-1, tts-1-hd
gemini-2.5-flash-tts
Key Takeaways:
  • Understanding (Input): Media → Tokens → Cost per token
  • Generation (Output):
    • Images: Per image (DALL-E 3, Imagen 4) OR per token (gemini-2.5-flash-image)
    • Video: Per second (Veo)
    • Audio Transcription: Per second (Whisper)
    • Text-to-Speech: Per character (OpenAI TTS) OR per token (Gemini TTS)
  • Provider Controls Conversion: We trust provider counts
  • Not Tracked Separately: Input media tokens are combined with text tokens in input_tokens

Example: Image Token Calculation

Request:
- Text prompt: "Describe this image" (4 tokens)
- Image: 1024x1024 JPG (converted to 765 tokens by provider)

Provider Response:
{
  "usage": {
    "input_tokens": 769,     // 4 text + 765 image
    "output_tokens": 50      // Response tokens
  }
}

Cost Calculation (using GPT-4o pricing):
Input: 769 tokens × $2.50/1M = $0.0019225
Output: 50 tokens × $10.00/1M = $0.0005
Total: $0.0024225
LLM Ops Dashboard shows:
  • Input Tokens: 769 (includes both text and image)
  • Output Tokens: 50
  • Total Cost: $0.0024225
Image/Audio/Video Breakdown Not Available:LLM Ops does not currently separate multimodal tokens from text tokens. All input tokens (text + image + video + audio) are tracked together as input_tokens.If you need separate multimodal token tracking, please contact us at [email protected].

Pricing Updates

Model pricing is set by the providers (Anthropic, OpenAI, Google) and can change at any time. How we handle updates:
  • ✅ We monitor provider pricing pages daily
  • ✅ Updates are applied within 24 hours of provider changes
  • ✅ Historical data uses pricing from the time of request
  • ✅ You’re notified of major pricing changes
Last pricing update: January 10, 2026Check this page regularly for pricing updates.

Need a Model Added?

If you’re using a model that’s not listed here:
1

Check Provider Documentation

Verify the model exists in your provider’s official API docs
2

Contact Us

Email [email protected] with:
  • Model name
  • Provider (Anthropic/OpenAI/Google)
  • Link to provider pricing
3

We'll Add It

We typically add new models within 2-3 business days

Next Steps