Supported AI Models - Cloudidr Documentation

Overview

These are the models supported by CloudIDR LLM Ops to track API costs. The model pricing is from the providers which we use to calculate your spend.

Last Updated: January 10, 2026Model pricing is subject to change by the providers. We update our pricing regularly to ensure accurate cost tracking.

Model Not Listed?If your model is not in this list, please contact us at support@cloudidr.com and we’ll add support for it.

Pricing Tables

Anthropic
OpenAI
Google

Anthropic Claude Models

All pricing is per 1 million tokens.

Model	Input Cost	Output Cost
Claude Opus 4.5
claude-opus-4-5-20251101	$5.00	$25.00
Claude Opus 4.1
claude-opus-4-1-20250805	$15.00	$75.00
Claude Opus 4
claude-opus-4-20250514	$15.00	$75.00
Claude Sonnet 4.5
claude-sonnet-4-5-20250929	$3.00	$15.00
Claude Sonnet 4
claude-sonnet-4-20250514	$3.00	$15.00
Claude Haiku 4.5
claude-haiku-4-5-20251001	$1.00	$5.00
Claude 3.5 Haiku
claude-3-5-haiku-20241022	$0.80	$4.00
Claude 3 Haiku
claude-3-haiku-20240307	$0.25	$1.25

Model Recommendations:

Opus - Most capable, best for complex reasoning
Sonnet - Balanced performance and cost
Haiku - Fastest and most affordable

Integration Guide

See the Anthropic Integration page to start tracking costs.

OpenAI Models

All pricing is per 1 million tokens unless otherwise noted.

GPT-5 Family

Model	Input Cost	Output Cost
gpt-5.2	$1.75	$14.00
gpt-5.2-chat-latest	$1.75	$14.00
gpt-5.2-pro	$21.00	$168.00
gpt-5.1	$1.25	$10.00
gpt-5.1-chat-latest	$1.25	$10.00
gpt-5.1-codex-max	$1.25	$10.00
gpt-5.1-codex	$1.25	$10.00
gpt-5.1-codex-mini	$0.25	$2.00
gpt-5	$1.25	$10.00
gpt-5-chat-latest	$1.25	$10.00
gpt-5-codex	$1.25	$10.00
gpt-5-pro	$15.00	$120.00
gpt-5-mini	$0.25	$2.00
gpt-5-nano	$0.05	$0.40
gpt-5-search-api	$1.25	$10.00

GPT-4.1 Family

Model	Input Cost	Output Cost
gpt-4.1	$2.00	$8.00
gpt-4.1-mini	$0.40	$1.60
gpt-4.1-nano	$0.10	$0.40

GPT-4o Family

Model	Input Cost	Output Cost
gpt-4o	$2.50	$10.00
gpt-4o-2024-05-13	$5.00	$15.00
gpt-4o-mini	$0.15	$0.60
gpt-4o-mini-2024-07-18	$0.15	$0.60
gpt-4o-search-preview	$2.50	$10.00
gpt-4o-mini-search-preview	$0.15	$0.60

Realtime & Audio Models

Model	Input Cost	Output Cost
gpt-realtime	$4.00	$16.00
gpt-realtime-mini	$0.60	$2.40
gpt-4o-realtime-preview	$5.00	$20.00
gpt-4o-mini-realtime-preview	$0.60	$2.40
gpt-audio	$2.50	$10.00
gpt-audio-mini	$0.60	$2.40
gpt-4o-audio-preview	$2.50	$10.00
gpt-4o-mini-audio-preview	$0.15	$0.60

Image Generation Models (Per-Image Pricing)

Model	Resolution	Quality	Price Per Image
dall-e-3	1024×1024	Standard	$0.040
dall-e-3	1024×1792	Standard	$0.080
dall-e-3	1792×1024	Standard	$0.080
dall-e-3	1024×1024	HD	$0.080
dall-e-3	1024×1792	HD	$0.120
dall-e-3	1792×1024	HD	$0.120

Audio Transcription Models (Per-Second Pricing)

Model	Price Per Second	Notes
whisper-1	$0.0001	$0.006/min transcription

Text-to-Speech Models (Per-Character Pricing)

Model	Price Per 1M Characters	Notes
tts-1	$15.00	Standard quality
tts-1-hd	$30.00	High definition

o-Series (Reasoning Models)

Model	Input Cost	Output Cost
o1	$15.00	$60.00
o1-pro	$150.00	$600.00
o1-mini	$1.10	$4.40
o3	$2.00	$8.00
o3-pro	$20.00	$80.00
o3-mini	$1.10	$4.40
o3-deep-research	$10.00	$40.00
o4-mini	$1.10	$4.40
o4-mini-deep-research	$2.00	$8.00

GPT-4 Legacy

Model	Input Cost	Output Cost
gpt-4	$30.00	$60.00
gpt-4-turbo	$10.00	$30.00
gpt-4-turbo-preview	$10.00	$30.00

GPT-3.5 Family

Model	Input Cost	Output Cost
gpt-3.5-turbo	$0.50	$1.50
gpt-3.5-turbo-16k	$3.00	$4.00

Model Recommendations:

gpt-4o - Best balance of capability and speed
gpt-4o-mini - Most cost-effective for simple tasks
o1/o3 - Advanced reasoning for complex problems
dall-e-3 - High-quality image generation
whisper-1 - Audio transcription at $0.006/minute
tts-1 - Natural text-to-speech

Integration Guide

See the OpenAI Integration page to start tracking costs.

Google Gemini Models

All pricing is per 1 million tokens unless otherwise noted.

Gemini 3 Series

Model	Input Cost	Output Cost	Special Rates
gemini-3-pro-preview	$2.00	$12.00
gemini-3-pro-image-preview	$2.00	$12.00	Image Output: $120.00
gemini-3-flash-preview	$0.50	$3.00	Audio Input: $1.00

Gemini 2.5 Series

Model	Input Cost	Output Cost	Special Rates
Pro Models
gemini-2.5-pro	$1.25	$10.00
gemini-2.5-pro-preview-tts	$1.00	$20.00	TTS: Audio output
Flash Models
gemini-2.5-flash	$0.30	$2.50	Audio Input: $1.00
gemini-2.5-flash-preview-09-2025	$0.30	$2.50	Audio Input: $1.00
gemini-2.5-flash-preview-tts	$0.50	$10.00	TTS: Audio output
gemini-2.5-flash-image	$0.30	$2.50	Image Output: $30.00
Flash-Lite Models
gemini-2.5-flash-lite	$0.10	$0.40	Audio Input: $0.30
gemini-2.5-flash-lite-preview-09-2025	$0.10	$0.40	Audio Input: $0.30
Specialized Models
gemini-2.5-computer-use-preview-10-2025	$1.25	$10.00
gemini-2.5-flash-native-audio-preview-12-2025	$0.50	$2.00	Audio In: $3.00 Audio Out: $12.00

Gemini 2.0 Series

Model	Input Cost	Output Cost	Special Rates
gemini-2.0-flash	$0.10	$0.40	Audio Input: $0.70
gemini-2.0-flash-lite	$0.075	$0.30

Latest Aliases (Dynamic)

Model	Input Cost	Output Cost	Maps To
gemini-flash-latest	$0.30	$2.50	gemini-2.5-flash
gemini-pro-latest	$1.25	$10.00	gemini-2.5-pro
gemini-flash-lite-latest	$0.10	$0.40	gemini-2.5-flash-lite

Gemini 1.5 Series (Legacy)

Model	Input Cost	Output Cost
gemini-1.5-pro	$1.25	$5.00
gemini-1.5-flash	$0.075	$0.30

Other Legacy Models

Model	Input Cost	Output Cost
gemini-pro	$0.50	$1.50
gemini-flash	$0.075	$0.30
palm-2	$0.50	$1.50

Image Generation Models (Per-Image Pricing)

Model	Price Per Image	Notes
imagen-4.0-fast-generate-001	$0.02	Fast generation
imagen-4.0-generate-001	$0.04	Standard generation
imagen-4.0-ultra-generate-001	$0.06	Ultra quality

Video Generation Models (Per-Second Pricing)

Model	Price Per Second	Notes
veo-3.1-fast-generate-preview	$0.15	Fast generation (Preview)
veo-3.1-generate-preview	$0.40	Standard (Preview)
veo-3.0-fast-generate-001	$0.15	Fast generation (Stable)
veo-3.0-generate-001	$0.40	Standard (Stable)

Model Recommendations:

gemini-2.5-pro - Most capable for complex tasks
gemini-2.5-flash - Best cost/performance balance
gemini-2.5-flash-lite - Most affordable option
gemini-2.5-flash-image - Native image generation
imagen-4.0-fast - Fast image generation at $0.02/image
veo-3.0-fast - Fast video generation at $0.15/second

Special Pricing Notes:

TTS Models: Output is audio tokens, not text
Audio Input: Premium pricing for audio/video multimodal input
Image Generation (Imagen): Charged per image generated, not per token
Video Generation (Veo): Charged per second of video generated
gemini-2.5-flash-image:
- Text input: $0.30 per 1M tokens
- Text output: $2.50 per 1M tokens
- Image output: $30.00 per 1M tokens (equivalent to $0.039 per image)

Integration Guide

See the Gemini Integration page to start tracking costs.

Cost Comparison

Most Affordable Models (Under $1/M tokens)

Best for high-volume, simple tasks:

GPT-5 Nano: $0.05 input /$ 0.40 output
Gemini 2.0 Flash Lite: $0.075 input /$ 0.30 output
Gemini 1.5 Flash: $0.075 input /$ 0.30 output
GPT-4.1 Nano: $0.10 input /$ 0.40 output
Gemini 2.5 Flash Lite: $0.10 input /$ 0.40 output
GPT-4o Mini: $0.15 input /$ 0.60 output
Claude 3 Haiku: $0.25 input /$ 1.25 output
Claude 3.5 Haiku: $0.80 input /$ 4.00 output

Perfect for: Classification, extraction, simple Q&A, high-throughput tasks

Mid-Range Models ($1-$5/M tokens)

Balanced performance and cost:

Claude Haiku 4.5: $1.00 input /$ 5.00 output
GPT-4.1: $2.00 input /$ 8.00 output
GPT-4o: $2.50 input /$ 10.00 output
Claude Sonnet 4.5: $3.00 input /$ 15.00 output
Claude Opus 4.5: $5.00 input /$ 25.00 output

Perfect for: Customer support, content generation, code assistance

Premium Models ($10+/M tokens)

Advanced reasoning and complex tasks:

Claude Opus 4: $15.00 input /$ 75.00 output
o1: $15.00 input /$ 60.00 output
o3: $2.00 input /$ 8.00 output
o1-pro: $150.00 input /$ 600.00 output
GPT-5 Pro: $15.00 input /$ 120.00 output
GPT-4 (legacy): $30.00 input /$ 60.00 output

Perfect for: Complex reasoning, research, code generation, expert analysis

Image Generation (Per-Image Pricing)

AI image generation models:

Imagen 4 Fast: $0.02 per image
DALL-E 3 Standard (1024×1024): $0.040 per image
Imagen 4 Standard: $0.04 per image
gemini-2.5-flash-image: $0.039 per image (+ token costs)
Imagen 4 Ultra: $0.06 per image
DALL-E 3 HD (1024×1024): $0.080 per image
DALL-E 3 Large (1792×1024): $0.080-$ 0.120 per image

Perfect for: Marketing materials, product images, illustrations

Video Generation (Per-Second Pricing)

AI video generation models:

Veo 3.0/3.1 Fast: $0.15 per second
Veo 3.0/3.1 Standard: $0.40 per second

Perfect for: Marketing videos, product demos, content creation

Audio Models (Transcription & TTS)

Audio transcription models:

Whisper-1: $0.006 per minute ($ 0.0001/second)

Text-to-speech models:

TTS-1 (Standard): $15.00 per 1M characters
TTS-1-HD (High Definition): $30.00 per 1M characters

Perfect for: Transcription services, voice assistants, audiobooks, accessibility

How Pricing Works

Token Calculation

LLM Ops tracks both input and output tokens separately:

Input tokens = Your prompt + any system messages + conversation history
Output tokens = The model’s response

Example:

Input: "Write a haiku about AI" (6 tokens)
Output: "Silicon dreams flow / Algorithms learn and grow / Future unfolds now" (15 tokens)

Cost with GPT-4o:
Input: 6 tokens × $2.50/1M = $0.000015
Output: 15 tokens × $10.00/1M = $0.00015
Total: $0.000165

Cost Calculation

Your total cost is calculated as:

Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

All costs are tracked in real-time and displayed in your LLM Ops Dashboard.

Image & Video Generation Pricing

Image Generation Models

Models like DALL-E 3, Imagen 4, and gemini-2.5-flash-image generate images and are priced differently: DALL-E 3 (Per-Image):

Cost = Number of Images × Price Per Image

Example (Standard 1024×1024):
Generate 5 images with dall-e-3
Cost = 5 images × $0.040 = $0.20

Example (HD 1024×1792):
Generate 3 images with dall-e-3 HD quality
Cost = 3 images × $0.120 = $0.36

Imagen 4 Models (Per-Image):

Cost = Number of Images × Price Per Image

Example:
Generate 5 images with imagen-4.0-fast-generate-001
Cost = 5 images × $0.02 = $0.10

gemini-2.5-flash-image (Token-Based):

Text Input Cost = Input Tokens × $0.30/1M
Text Output Cost = Text Output Tokens × $2.50/1M
Image Output Cost = Image Output Tokens × $30.00/1M

Example:
Input: "Generate a sunset image" (100 tokens)
Output: Text description (50 tokens) + Image (1,290 tokens)

Text Input: 100 × $0.30/1M = $0.00003
Text Output: 50 × $2.50/1M = $0.000125
Image Output: 1,290 × $30.00/1M = $0.0387
Total: $0.038855

Note: Each image consumes approximately 1,290 tokens
This equals ~$0.039 per image

Video Generation Models

Veo Models (Per-Second):

Cost = Video Duration (seconds) × Price Per Second

Example:
Generate 10-second video with veo-3.0-fast-generate-001
Cost = 10 seconds × $0.15 = $1.50

Audio Transcription Models

Whisper (Per-Second):

Cost = Audio Duration (seconds) × Price Per Second

Example:
Transcribe 125.5-second audio file with whisper-1
Cost = 125.5 seconds × $0.0001 = $0.01255

Note: $0.0001/second = $0.006/minute

Text-to-Speech Models

TTS Models (Per-Character):

Cost = (Characters / 1,000,000) × Price Per Million Characters

Example (Standard):
Generate speech from 1,250 characters with tts-1
Cost = (1,250 / 1,000,000) × $15.00 = $0.01875

Example (HD):
Generate speech from 1,250 characters with tts-1-hd
Cost = (1,250 / 1,000,000) × $30.00 = $0.0375

Provider Counting:Image, video, and audio generation costs are calculated based on what the provider reports:

Images (DALL-E 3, Imagen 4): Provider returns number of images generated
Videos (Veo): Provider returns video duration in seconds
Audio (Whisper): Provider returns audio duration in seconds
TTS (TTS-1, TTS-1-HD): Provider returns character count of input text

We trust the provider’s counts and multiply by our pricing table.

Multimodal Pricing

Two Types of Multimodal Models

1. Multimodal Understanding Models (Token-Based)

These models analyze images, videos, and audio
Input media is converted to tokens by the provider
Charged per token (text + media tokens combined)
Examples: GPT-4o, Claude Opus, Gemini 2.5 Flash

2. Media Generation Models (Per-Unit)

These models create images, videos, or audio
DALL-E 3, Imagen 4: Charged per image generated
Veo 3: Charged per second of video generated
Whisper: Charged per second of audio transcribed
TTS: Charged per character of text input
gemini-2.5-flash-image: Hybrid (token-based, but image output uses premium rate)

How Multimodal Tokens Are Tracked:For understanding models, providers (OpenAI, Anthropic, Google) automatically convert images, video, and audio into tokens and include them in the response. LLM Ops tracks the total token count returned by the provider.Image/video/audio input tokens are included in input_tokens - they are not tracked separately.For generation models, we track based on what the provider charges:

Text tokens: Standard input/output pricing
Images generated: Per-image or per-token (depending on model)
Video generated: Per-second of video
Audio transcribed: Per-second of audio
TTS generated: Per-character of input text

Multimodal Understanding Models (Token-Based)

These models analyze images, video, and audio sent as input:

Images
Audio
Video

Models with image understanding:

Gemini 2.0/2.5 Flash
GPT-4o
GPT-4o Mini
Claude Opus 4/4.5
Claude Sonnet 4/4.5

How it works:

You send an image with your prompt
Provider converts image to tokens based on resolution
Provider returns total input_tokens (text + image)
LLM Ops tracks the total as input tokens
Cost = input_tokens × input_price

Note: Higher resolution images = more input tokens = higher cost

Models with audio support:

GPT-4o Audio Preview
GPT-4o Realtime Preview
Gemini 2.0 Flash
Gemini 2.5 Flash (with audio input premium)

How it works:

You send audio with your prompt
Provider converts audio to tokens based on duration
Provider returns total input_tokens (text + audio)
LLM Ops tracks the total as input tokens
Cost = input_tokens × input_price

Models with video support:

Gemini 2.0/2.5 Flash
Gemini 1.5 Pro

How it works:

You send video with your prompt
Provider converts video to tokens (duration × resolution × frames)
Provider returns total input_tokens (text + video)
LLM Ops tracks the total as input tokens
Cost = input_tokens × input_price

Note: Longer videos at higher resolution = significantly more input tokens

Pricing Breakdown Summary

Here’s how different types of content are charged:

Content Type	Understanding (Input)	Generation (Output)	Examples
Text	Per token	Per token	All models
Images (Input)	Converted to tokens by provider Included in `input_tokens`	N/A	GPT-4o, Claude Opus, Gemini Flash
Audio (Input)	Converted to tokens by provider Included in `input_tokens` Some models charge premium rate	N/A	GPT-4o Audio, Gemini Flash (Audio input: $1.00/1M vs$ 0.30/1M text)
Audio Transcription	Per second ( $0.0001/sec =$ 0.006/min)	Text output (per token)	whisper-1
Video (Input)	Converted to tokens by provider Included in `input_tokens`	N/A	Gemini 2.5 Flash
Images (Output)	N/A	DALL-E 3: Per image ( $0.040-$ 0.120) Imagen 4: Per image ( $0.02-$ 0.06) gemini-2.5-flash-image: Per token ( $30/1M = ~$ 0.039/image)	dall-e-3, imagen-4.0-fast gemini-2.5-flash-image
Video (Output)	N/A	Per second ( $0.15-$ 0.40/sec)	veo-3.0-fast, veo-3.1
Audio (Output - TTS)	N/A	OpenAI TTS: Per character ( $15-$ 30/1M) Gemini TTS: Per token ( $10-$ 20/1M)	tts-1, tts-1-hd gemini-2.5-flash-tts

Key Takeaways:

Understanding (Input): Media → Tokens → Cost per token
Generation (Output):
- Images: Per image (DALL-E 3, Imagen 4) OR per token (gemini-2.5-flash-image)
- Video: Per second (Veo)
- Audio Transcription: Per second (Whisper)
- Text-to-Speech: Per character (OpenAI TTS) OR per token (Gemini TTS)
Provider Controls Conversion: We trust provider counts
Not Tracked Separately: Input media tokens are combined with text tokens in input_tokens

Example: Image Token Calculation

Request:
- Text prompt: "Describe this image" (4 tokens)
- Image: 1024x1024 JPG (converted to 765 tokens by provider)

Provider Response:
{
  "usage": {
    "input_tokens": 769,     // 4 text + 765 image
    "output_tokens": 50      // Response tokens
  }
}

Cost Calculation (using GPT-4o pricing):
Input: 769 tokens × $2.50/1M = $0.0019225
Output: 50 tokens × $10.00/1M = $0.0005
Total: $0.0024225

LLM Ops Dashboard shows:

Input Tokens: 769 (includes both text and image)
Output Tokens: 50
Total Cost: $0.0024225

Image/Audio/Video Breakdown Not Available:LLM Ops does not currently separate multimodal tokens from text tokens. All input tokens (text + image + video + audio) are tracked together as input_tokens.If you need separate multimodal token tracking, please contact us at support@cloudidr.com.

Pricing Updates

Model pricing is set by the providers (Anthropic, OpenAI, Google) and can change at any time. How we handle updates:

✅ We monitor provider pricing pages daily
✅ Updates are applied within 24 hours of provider changes
✅ Historical data uses pricing from the time of request
✅ You’re notified of major pricing changes

Last pricing update: January 10, 2026Check this page regularly for pricing updates.

Need a Model Added?

If you’re using a model that’s not listed here:

Check Provider Documentation

Verify the model exists in your provider’s official API docs

Email support@cloudidr.com with:

Model name
Provider (Anthropic/OpenAI/Google)
Link to provider pricing

We'll Add It

We typically add new models within 2-3 business days

Next Steps

Anthropic Integration

Start tracking Claude costs

OpenAI Integration

Start tracking GPT costs

Google Integration

Start tracking Gemini costs

Get Started

LLM Ops

Flex Compute

​Overview

​Pricing Tables

​Anthropic Claude Models

​Integration Guide

​OpenAI Models

​GPT-5 Family

​GPT-4.1 Family

​GPT-4o Family

​Realtime & Audio Models

​Image Generation Models (Per-Image Pricing)

​Audio Transcription Models (Per-Second Pricing)

​Text-to-Speech Models (Per-Character Pricing)

​o-Series (Reasoning Models)

​GPT-4 Legacy

​GPT-3.5 Family

​Integration Guide

​Google Gemini Models

​Gemini 3 Series

​Gemini 2.5 Series

​Gemini 2.0 Series

​Latest Aliases (Dynamic)

​Gemini 1.5 Series (Legacy)

​Other Legacy Models

​Image Generation Models (Per-Image Pricing)

​Video Generation Models (Per-Second Pricing)

​Integration Guide

​Cost Comparison

​How Pricing Works

​Token Calculation

​Cost Calculation

​Image & Video Generation Pricing

​Image Generation Models

​Video Generation Models

​Audio Transcription Models

​Text-to-Speech Models

​Multimodal Pricing

​Two Types of Multimodal Models

​Multimodal Understanding Models (Token-Based)

​Pricing Breakdown Summary

​Example: Image Token Calculation

​Pricing Updates

​Need a Model Added?

​Next Steps

Anthropic Integration

OpenAI Integration

Google Integration

Overview

Pricing Tables

Anthropic Claude Models

Integration Guide

OpenAI Models

GPT-5 Family

GPT-4.1 Family

GPT-4o Family

Realtime & Audio Models

Image Generation Models (Per-Image Pricing)

Audio Transcription Models (Per-Second Pricing)

Text-to-Speech Models (Per-Character Pricing)

o-Series (Reasoning Models)

GPT-4 Legacy

GPT-3.5 Family

Integration Guide

Google Gemini Models

Gemini 3 Series

Gemini 2.5 Series

Gemini 2.0 Series

Latest Aliases (Dynamic)

Gemini 1.5 Series (Legacy)

Other Legacy Models

Image Generation Models (Per-Image Pricing)

Video Generation Models (Per-Second Pricing)

Integration Guide

Cost Comparison

How Pricing Works

Token Calculation

Cost Calculation

Image & Video Generation Pricing

Image Generation Models

Video Generation Models

Audio Transcription Models

Text-to-Speech Models

Multimodal Pricing

Two Types of Multimodal Models

Multimodal Understanding Models (Token-Based)

Pricing Breakdown Summary

Example: Image Token Calculation

Pricing Updates

Need a Model Added?

Next Steps