Overview
These are the models supported by CloudIDR LLM Ops to track API costs. The model pricing is from the providers which we use to calculate your spend.Last Updated: January 10, 2026Model pricing is subject to change by the providers. We update our pricing regularly to ensure accurate cost tracking.
Pricing Tables
- Anthropic
- OpenAI
- Google
Anthropic Claude Models
All pricing is per 1 million tokens.| Model | Input Cost | Output Cost |
|---|---|---|
| Claude Opus 4.5 | ||
| claude-opus-4-5-20251101 | $5.00 | $25.00 |
| Claude Opus 4.1 | ||
| claude-opus-4-1-20250805 | $15.00 | $75.00 |
| Claude Opus 4 | ||
| claude-opus-4-20250514 | $15.00 | $75.00 |
| Claude Sonnet 4.5 | ||
| claude-sonnet-4-5-20250929 | $3.00 | $15.00 |
| Claude Sonnet 4 | ||
| claude-sonnet-4-20250514 | $3.00 | $15.00 |
| Claude Haiku 4.5 | ||
| claude-haiku-4-5-20251001 | $1.00 | $5.00 |
| Claude 3.5 Haiku | ||
| claude-3-5-haiku-20241022 | $0.80 | $4.00 |
| Claude 3 Haiku | ||
| claude-3-haiku-20240307 | $0.25 | $1.25 |
Model Recommendations:
- Opus - Most capable, best for complex reasoning
- Sonnet - Balanced performance and cost
- Haiku - Fastest and most affordable
Integration Guide
See the Anthropic Integration page to start tracking costs.Cost Comparison
Most Affordable Models (Under $1/M tokens)
Most Affordable Models (Under $1/M tokens)
Best for high-volume, simple tasks:
- GPT-5 Nano: 0.40 output
- Gemini 2.0 Flash Lite: 0.30 output
- Gemini 1.5 Flash: 0.30 output
- GPT-4.1 Nano: 0.40 output
- Gemini 2.5 Flash Lite: 0.40 output
- GPT-4o Mini: 0.60 output
- Claude 3 Haiku: 1.25 output
- Claude 3.5 Haiku: 4.00 output
Mid-Range Models ($1-$5/M tokens)
Mid-Range Models ($1-$5/M tokens)
Balanced performance and cost:
- Claude Haiku 4.5: 5.00 output
- GPT-4.1: 8.00 output
- GPT-4o: 10.00 output
- Claude Sonnet 4.5: 15.00 output
- Claude Opus 4.5: 25.00 output
Premium Models ($10+/M tokens)
Premium Models ($10+/M tokens)
Image Generation (Per-Image Pricing)
Image Generation (Per-Image Pricing)
AI image generation models:
- Imagen 4 Fast: $0.02 per image
- DALL-E 3 Standard (1024×1024): $0.040 per image
- Imagen 4 Standard: $0.04 per image
- gemini-2.5-flash-image: $0.039 per image (+ token costs)
- Imagen 4 Ultra: $0.06 per image
- DALL-E 3 HD (1024×1024): $0.080 per image
- DALL-E 3 Large (1792×1024): 0.120 per image
Video Generation (Per-Second Pricing)
Video Generation (Per-Second Pricing)
AI video generation models:
- Veo 3.0/3.1 Fast: $0.15 per second
- Veo 3.0/3.1 Standard: $0.40 per second
Audio Models (Transcription & TTS)
Audio Models (Transcription & TTS)
Audio transcription models:
- Whisper-1: 0.0001/second)
- TTS-1 (Standard): $15.00 per 1M characters
- TTS-1-HD (High Definition): $30.00 per 1M characters
How Pricing Works
Token Calculation
LLM Ops tracks both input and output tokens separately:- Input tokens = Your prompt + any system messages + conversation history
- Output tokens = The model’s response
Cost Calculation
Your total cost is calculated as:Image & Video Generation Pricing
Image Generation Models
Models like DALL-E 3, Imagen 4, and gemini-2.5-flash-image generate images and are priced differently: DALL-E 3 (Per-Image):Video Generation Models
Veo Models (Per-Second):Audio Transcription Models
Whisper (Per-Second):Text-to-Speech Models
TTS Models (Per-Character):Provider Counting:Image, video, and audio generation costs are calculated based on what the provider reports:
- Images (DALL-E 3, Imagen 4): Provider returns number of images generated
- Videos (Veo): Provider returns video duration in seconds
- Audio (Whisper): Provider returns audio duration in seconds
- TTS (TTS-1, TTS-1-HD): Provider returns character count of input text
Multimodal Pricing
Two Types of Multimodal Models
1. Multimodal Understanding Models (Token-Based)- These models analyze images, videos, and audio
- Input media is converted to tokens by the provider
- Charged per token (text + media tokens combined)
- Examples: GPT-4o, Claude Opus, Gemini 2.5 Flash
- These models create images, videos, or audio
- DALL-E 3, Imagen 4: Charged per image generated
- Veo 3: Charged per second of video generated
- Whisper: Charged per second of audio transcribed
- TTS: Charged per character of text input
- gemini-2.5-flash-image: Hybrid (token-based, but image output uses premium rate)
How Multimodal Tokens Are Tracked:For understanding models, providers (OpenAI, Anthropic, Google) automatically convert images, video, and audio into tokens and include them in the response. LLM Ops tracks the total token count returned by the provider.Image/video/audio input tokens are included in
input_tokens - they are not tracked separately.For generation models, we track based on what the provider charges:- Text tokens: Standard input/output pricing
- Images generated: Per-image or per-token (depending on model)
- Video generated: Per-second of video
- Audio transcribed: Per-second of audio
- TTS generated: Per-character of input text
Multimodal Understanding Models (Token-Based)
These models analyze images, video, and audio sent as input:- Images
- Audio
- Video
Models with image understanding:
- Gemini 2.0/2.5 Flash
- GPT-4o
- GPT-4o Mini
- Claude Opus 4/4.5
- Claude Sonnet 4/4.5
- You send an image with your prompt
- Provider converts image to tokens based on resolution
- Provider returns total
input_tokens(text + image) - LLM Ops tracks the total as input tokens
- Cost =
input_tokens × input_price
Pricing Breakdown Summary
Here’s how different types of content are charged:| Content Type | Understanding (Input) | Generation (Output) | Examples |
|---|---|---|---|
| Text | Per token | Per token | All models |
| Images (Input) | Converted to tokens by provider Included in input_tokens | N/A | GPT-4o, Claude Opus, Gemini Flash |
| Audio (Input) | Converted to tokens by provider Included in input_tokensSome models charge premium rate | N/A | GPT-4o Audio, Gemini Flash (Audio input: 0.30/1M text) |
| Audio Transcription | Per second (0.006/min) | Text output (per token) | whisper-1 |
| Video (Input) | Converted to tokens by provider Included in input_tokens | N/A | Gemini 2.5 Flash |
| Images (Output) | N/A | DALL-E 3: Per image (0.120) Imagen 4: Per image (0.06) gemini-2.5-flash-image: Per token (0.039/image) | dall-e-3, imagen-4.0-fast gemini-2.5-flash-image |
| Video (Output) | N/A | Per second (0.40/sec) | veo-3.0-fast, veo-3.1 |
| Audio (Output - TTS) | N/A | OpenAI TTS: Per character (30/1M) Gemini TTS: Per token (20/1M) | tts-1, tts-1-hd gemini-2.5-flash-tts |
Key Takeaways:
- Understanding (Input): Media → Tokens → Cost per token
- Generation (Output):
- Images: Per image (DALL-E 3, Imagen 4) OR per token (gemini-2.5-flash-image)
- Video: Per second (Veo)
- Audio Transcription: Per second (Whisper)
- Text-to-Speech: Per character (OpenAI TTS) OR per token (Gemini TTS)
- Provider Controls Conversion: We trust provider counts
- Not Tracked Separately: Input media tokens are combined with text tokens in
input_tokens
Example: Image Token Calculation
- Input Tokens: 769 (includes both text and image)
- Output Tokens: 50
- Total Cost: $0.0024225
Pricing Updates
Model pricing is set by the providers (Anthropic, OpenAI, Google) and can change at any time. How we handle updates:- ✅ We monitor provider pricing pages daily
- ✅ Updates are applied within 24 hours of provider changes
- ✅ Historical data uses pricing from the time of request
- ✅ You’re notified of major pricing changes
Last pricing update: January 10, 2026Check this page regularly for pricing updates.
Need a Model Added?
If you’re using a model that’s not listed here:1
Check Provider Documentation
Verify the model exists in your provider’s official API docs
2
Contact Us
Email [email protected] with:
- Model name
- Provider (Anthropic/OpenAI/Google)
- Link to provider pricing
3
We'll Add It
We typically add new models within 2-3 business days

