OCI · GENERATIVE AI

OCI GenAI Catalog

Source: OCI Official Documentation | Updated 17 Apr 2026 | 30+ models

Task › Tier › Deployment › Region

Step 1 — What are you building?

Provider Context

No models match the selected filters.

5

Model Providers

26

Chat Models (Active)

9

Embedding Models

1

Rerank Model

79

Imported Models

Chat Models — Cohere Family

Model	Model ID	Tier	Parameters	Context	Multimodal	Reasoning	Tool Use	Fine-tunable	Status	Regions	Best For
Cohere Command A Reasoning	cohere.command-a-reasoning	★ Flagship Reasoning	111B	256K	✗	✓ Advanced	✓	✗	● GA (Aug 2025)	US-ASH US-CHI US-PHX SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA	Complex Q&A, multi-step reasoning, document analysis, structured arguments
Cohere Command A Vision	cohere.command-a-vision	★ Flagship Multimodal	112B	128K	✓ Images, Charts, Docs	✓	✓	✗	● GA (Jul 2025)	US-ASH US-CHI US-PHX SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA	Enterprise document understanding with charts & images
Cohere Command A	cohere.command-a-03-2025	◉ Flagship Chat	111B	256K	✗	—	✓ Advanced	✗	● GA (Mar 2025)	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA	Agentic enterprise tasks, RAG, multilingual, high-throughput production
Cohere Command R+ 08-2024	cohere.command-r-plus-08-2024	◉ Advanced	104B	128K	✗	—	✓	✗	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-OSA	Complex specialized tasks, Q&A, sentiment, multilingual RAG
Cohere Command R 08-2024	cohere.command-r-08-2024	▷ Standard	35B	128K	✗	—	✓	✓ T-Few / Vanilla	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-RUH AP-OSA	RAG pipelines, info retrieval, cost-efficient enterprise chat

Chat Models — Google Gemini Family

Model	Model ID	Tier	Parameters	Context	Multimodal	Reasoning	Tool Use	Fine-tunable	Status	Regions	Best For
Google Gemini 2.5 Pro	google.gemini-2.5-pro	★ Flagship	—	1M	✓ Text, Image, Code, Audio, Video	✓ Advanced Reasoning	✓	✗	● GA	US-ASH US-CHI US-PHX EU-FRA AP-OSA	Most complex multimodal problems, large dataset analysis, SOTA reasoning tasks
Google Gemini 2.5 Flash	google.gemini-2.5-flash	◉ Balanced	—	1M	✓ Text, Image, Code, Audio, Video	✓ Thinking features	✓	✗	● GA	US-ASH US-CHI US-PHX EU-FRA AP-HYD AP-OSA	Balanced workloads needing speed + intelligence, complex applications
Google Gemini 2.5 Flash-Lite	google.gemini-2.5-flash-lite	○ Budget / Fast	—	1M	✓ Text, Image, Code, Audio, Video	—	✓	✗	● GA	US-ASH US-CHI US-PHX EU-FRA	High-volume, simpler tasks; cost-sensitive production workloads

Chat Models — Meta Llama Family

Model	Model ID	Tier	Parameters	Context	Multimodal	Reasoning	Tool Use	Fine-tunable	Status	Regions	Best For
Meta Llama 4 Maverick	meta.llama-4-maverick-17b-128e-instruct-fp8	★ Flagship MoE	~400B	512K	✓ Text + Image	—	✓	✗	● GA (2025)	US-CHI SA-SAO UK-LON ME-RUH AP-HYD AP-OSA	Multimodal understanding, multilingual, coding, agentic systems, large-scale inference
Meta Llama 4 Scout	meta.llama-4-scout-17b-16e-instruct	◉ Efficient MoE	~109B	192K	✓ Text + Image	—	✓	✗	● GA (2025)	US-CHI SA-SAO UK-LON ME-RUH AP-HYD AP-OSA	Smaller GPU deployments, efficient multimodal, multilingual, coding
Meta Llama 3.3 70B	meta.llama-3.3-70b-instruct	◉ Best Text 70B	70B	128K	✗	—	✓	✓ LoRA	● GA	US-CHI US-PHX SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA	Best text-only 70B tasks; outperforms 3.1 70B and 3.2 90B on text benchmarks
Meta Llama 3.2 90B Vision	meta.llama-3.2-90b-vision-instruct	◉ Vision Flagship	90B	128K	✓ Text + Image	—	✓	✗	● Active	US-CHI SA-SAO UK-LON ME-RUH AP-OSA	Multimodal understanding with large model capacity, image reasoning
Meta Llama 3.2 11B Vision	meta.llama-3.2-11b-vision-instruct	▷ Compact Vision	11B	128K	✓ Text + Image	—	✓	✗	● Active (Dedicated only)	US-CHI SA-SAO UK-LON AP-OSA	Cost-efficient multimodal; resource-constrained deployments
Meta Llama 3.1 405B	meta.llama-3.1-405b-instruct	★ Largest Open	405B	128K	✗	—	✓	✗	● Active	US-CHI SA-SAO EU-FRA UK-LON AP-OSA	Highest text quality open model; complex reasoning, advanced generation

Chat Models — OpenAI gpt-oss Family

Model	Model ID	Tier	Parameters	Context	Multimodal	Reasoning	Tool Use	Fine-tunable	Status	Regions	Best For
OpenAI gpt-oss-120b	openai.gpt-oss-120b	★ Flagship OSS	120B	128K	✗	✓ Advanced Reasoning + Agentic	✓ Advanced Tool Use	✗	● GA	US-ASH US-CHI US-PHX SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA	Reasoning, agentic tasks; outperforms similar-size open models; OpenAI-compatible API
OpenAI gpt-oss-20b	openai.gpt-oss-20b	▷ Efficient OSS	20B	128K	✗	✓ Reasoning + Agentic	✓	✗	● GA	US-ASH US-CHI US-PHX SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA	Efficient consumer-hardware-optimized reasoning; agentic tasks at lower cost

Chat Models — xAI Grok Family

Model	Model ID	Tier	Parameters	Context	Multimodal	Reasoning	Tool Use	Fine-tunable	Status	Regions	Best For
xAI Grok 4	xai.grok-4	★ Flagship	—	128K	✓ Text + Image	✓ Advanced	✓	✗	● GA	US-ASH US-CHI US-PHX	Advanced multimodal reasoning, enterprise data extraction, coding, summarization
xAI Grok 4 Fast	xai.grok-4-fast-reasoning xai.grok-4-fast-non-reasoning	▷ Fast Flagship	—	2M	✓ Text + Image	✓ Reasoning + Non-Reasoning modes	✓	✗	● GA	US-ASH US-CHI US-PHX	Same capability as Grok 4 with 2M context; cost-speed-optimized production
xAI Grok 4.1 Fast	xai.grok-4-1-fast-reasoning xai.grok-4-1-fast-non-reasoning	★ Agentic Flagship	—	2M	✓ Text + Image	✓ Reasoning + Non-Reasoning modes	✓ Parallel Tool Calling	✗	● GA	US-ASH US-CHI US-PHX	Complex agentic systems, customer support, research with 2M multimodal context
xAI Grok 4.20	xai.grok-4.20-reasoning xai.grok-4.20-non-reasoning xai.grok-4.20-0309-reasoning xai.grok-4.20-0309-non-reasoning	★ Latest Flagship	—	2M	✓ Text + Image	✓ Reasoning + Non-Reasoning Variants	✓ Advanced Agentic	✗	● GA (Mar 2026)	US-ASH US-CHI US-PHX	Latest-gen multimodal agentic reasoning with 2M context and dual reasoning modes
xAI Grok 4.20 Multi-Agent	xai.grok-4.20-multi-agent xai.grok-4.20-multi-agent-0309	◆ Multi-Agent Research	—	2M	✓ Text + Image	✓ Orchestrated multi-agent reasoning	✓ Multi-Agent Orchestration	✗	● GA (Mar 2026)	US-ASH US-CHI US-PHX	Real-time multi-agent research — parallel web search, data analysis & synthesis by specialized sub-agents
xAI Grok 3	xai.grok-3	◉ Standard	—	131K	✗	—	✓	✗	● GA	US-ASH US-CHI US-PHX	General enterprise tasks, data extraction, text summarization
xAI Grok 3 Fast	xai.grok-3-fast	▷ Standard Fast	—	131K	✗	—	✓	✗	● GA	US-ASH US-CHI US-PHX	High-throughput enterprise tasks at Grok 3 quality level
xAI Grok 3 Mini	xai.grok-3-mini	○ Lightweight Thinker	—	131K	✗	✓ Traces exposed	✓	✗	● GA	US-ASH US-CHI US-PHX	Logic-based tasks not requiring deep domain knowledge; transparent thinking traces
xAI Grok 3 Mini Fast	xai.grok-3-mini-fast	○ Lightweight Fast	—	131K	✗	✓ Traces exposed	✓	✗	● GA	US-ASH US-CHI US-PHX	Low-latency logic tasks at minimum cost
xAI Grok Code Fast 1	xai.grok-code-fast-1	◆ Coding Specialist	—	256K	✗	✓ Summarized traces	✓ Agentic Coding	✗	● GA (Aug 2025)	US-ASH US-CHI US-PHX	TypeScript, Python, Java, Rust, C++, Go; zero-to-one projects, bug fixes, agentic coding loops

Embedding Models

Model Name	Model ID	Generation	Multimodal (Image)	Language Scope	Size Variant	Use Case	Status	Regions
Cohere Embed 4	cohere.embed-v4.0	Gen 4 · Latest	✓ Text + Image (base64)	Multilingual	Full	Latest multimodal embeddings; text & image semantic search	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-RUH ME-DXB AP-HYD AP-OSA
Cohere Embed English Image 3	cohere.embed-english-image-v3.0	Gen 3	✓ Text + Image	English	Full	English-only text+image semantic search	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-DXB AP-OSA
Cohere Embed English Light Image 3	cohere.embed-english-light-image-v3.0	Gen 3	✓ Text + Image	English	Light	Cost-efficient English text+image embeddings	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-DXB AP-OSA
Cohere Embed Multilingual Image 3	cohere.embed-multilingual-image-v3.0	Gen 3	✓ Text + Image	Multilingual	Full	Global multilingual text+image semantic search	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-DXB AP-HYD AP-OSA
Cohere Embed Multilingual Light Image 3	cohere.embed-multilingual-light-image-v3.0	Gen 3	✓ Text + Image	Multilingual	Light	Budget multilingual text+image embeddings	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-DXB AP-OSA
Cohere Embed English 3	cohere.embed-english-v3.0	Gen 3	✗ Text only	English	Full	Pure text English semantic search, classification, clustering	● Active	US-CHI SA-SAO EU-FRA UK-LON
Cohere Embed English Light 3	cohere.embed-english-light-v3.0	Gen 3	✗ Text only	English	Light	Cost-efficient English text embeddings at scale	● Active	US-CHI SA-SAO
Cohere Embed Multilingual 3	cohere.embed-multilingual-v3.0	Gen 3	✗ Text only	Multilingual	Full	Global enterprise text semantic search in 100+ languages	● Active	US-ASH US-CHI US-PHX SA-SAO EU-FRA UK-LON AP-HYD AP-OSA
Cohere Embed Multilingual Light 3	cohere.embed-multilingual-light-v3.0	Gen 3	✗ Text only	Multilingual	Light	Affordable multilingual text embeddings at volume	● Active	US-CHI SA-SAO

Rerank Model

Model Name	Model ID	Input	Output	Use Case	Status	Regions
Cohere Rerank 3.5	cohere.rerank.v3-5	Query + List of texts	Ordered array with relevance scores	RAG pipelines, document ranking, search result reordering, precision improvement	● Active	US-ASH US-CHI SA-SAO EU-FRA UK-LON ME-RUH AP-OSA

Imported Models

Available via OCI Generative AI Model Import — import open-weights models from HuggingFace into your own dedicated GPU cluster endpoint. 8 provider families · 79 models · supports fine-tuned variants within ±10% parameter count.

Alibaba Qwen Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
Alibaba QwQ-32B	Qwen/QwQ-32B	Reasoning	32B	128K	A100_80G_X2
Alibaba Qwen Image	Qwen/Qwen-Image	Image Gen	—	—	A100_80G_X1
Alibaba Qwen Image Edit	Qwen/Qwen-Image-Edit	Image Gen	—	—	A100_80G_X1
Alibaba Qwen Image 2512	Qwen/Qwen-Image-2512	Image Gen	—	—	A100_80G_X1
Alibaba Qwen Image Edit 2511	Qwen/Qwen-Image-Edit-2511	Image Gen	—	—	A100_80G_X1
Alibaba Qwen Image Edit 2509	Qwen/Qwen-Image-Edit-2509	Image Gen	—	—	A100_80G_X1
Alibaba Qwen3-Embedding-0.6B	Qwen/Qwen3-Embedding-0.6B	Embed	0.6B	32K	A10_X1
Alibaba Qwen3-Embedding-4B	Qwen/Qwen3-Embedding-4B	Embed	4B	32K	A10_X2
Alibaba Qwen3-Embedding-8B	Qwen/Qwen3-Embedding-8B	Embed	8B	32K	A100_80G_X1
Alibaba Qwen3-0.6B	Qwen/Qwen3-0.6B	Chat	0.6B	32K	A100_80G_X1
Alibaba Qwen3-1.7B	Qwen/Qwen3-1.7B	Chat	1.7B	32K	A100_80G_X1
Alibaba Qwen3-4B	Qwen/Qwen3-4B	Chat	4B	32K	A100_80G_X1
Alibaba Qwen3-8B	Qwen/Qwen3-8B	Chat	8B	32K	A100_80G_X1
Alibaba Qwen3-14B	Qwen/Qwen3-14B	Chat	14B	32K	A100_80G_X1
Alibaba Qwen3-32B	Qwen/Qwen3-32B	Chat	32B	32K	A100_80G_X2
Alibaba Qwen3-4B-Instruct-2507	Qwen/Qwen3-4B-Instruct-2507	Chat	4B	32K	A100_80G_X1
Alibaba Qwen3-30B-A3B-Instruct-2507	Qwen/Qwen3-30B-A3B-Instruct-2507	Chat	30B 3B active	32K	A100_80G_X2
Alibaba Qwen3-235B-A22B-Instruct-2507	Qwen/Qwen3-235B-A22B-Instruct-2507	Chat	235B 22B active	32K	H100_X8
Alibaba Qwen3-VL-30B-A3B-Instruct	Qwen/Qwen3-VL-30B-A3B-Instruct	Vision	30B 3B active	—	H100_X2
Alibaba Qwen3-VL-235B-A22B-Instruct	Qwen/Qwen3-VL-235B-A22B-Instruct	Vision	235B 22B active	—	H100_X8
Alibaba Qwen2.5-Coder-32B-Instruct	Qwen/Qwen2.5-Coder-32B-Instruct	Coder	32B	128K	A100_80G_X2
Alibaba Qwen2.5-0.5B-Instruct	Qwen/Qwen2.5-0.5B-Instruct	Chat	0.5B	128K	A100_80G_X1
Alibaba Qwen2.5-1.5B-Instruct	Qwen/Qwen2.5-1.5B-Instruct	Chat	1.5B	128K	A100_80G_X1
Alibaba Qwen2.5-3B-Instruct	Qwen/Qwen2.5-3B-Instruct	Chat	3B	128K	A100_80G_X1
Alibaba Qwen2.5-7B-Instruct	Qwen/Qwen2.5-7B-Instruct	Chat	7B	128K	A100_80G_X1
Alibaba Qwen2.5-14B-Instruct	Qwen/Qwen2.5-14B-Instruct	Chat	14B	128K	A100_80G_X1
Alibaba Qwen2.5-32B-Instruct	Qwen/Qwen2.5-32B-Instruct	Chat	32B	128K	A100_80G_X2
Alibaba Qwen2.5-72B-Instruct	Qwen/Qwen2.5-72B-Instruct	Chat	72B	128K	A100_80G_X4
Alibaba Qwen2.5-VL-3B-Instruct	Qwen/Qwen2.5-VL-3B-Instruct	Vision	3B	—	A100_80G_X1
Alibaba Qwen2.5-VL-7B-Instruct	Qwen/Qwen2.5-VL-7B-Instruct	Vision	7B	—	A100_80G_X1
Alibaba Qwen2.5-VL-32B-Instruct	Qwen/Qwen2.5-VL-32B-Instruct	Vision	32B	—	A100_80G_X2
Alibaba Qwen2.5-VL-72B-Instruct	Qwen/Qwen2.5-VL-72B-Instruct	Vision	72B	—	A100_80G_X4
Alibaba Qwen2-0.5B-Instruct	Qwen/Qwen2-0.5B-Instruct	Chat	0.5B	32K	A100_80G_X1
Alibaba Qwen2-1.5B-Instruct	Qwen/Qwen2-1.5B-Instruct	Chat	1.5B	32K	A100_80G_X1
Alibaba Qwen2-7B-Instruct	Qwen/Qwen2-7B-Instruct	Chat	7B	128K	A100_80G_X1
Alibaba Qwen2-72B-Instruct	Qwen/Qwen2-72B-Instruct	Chat	72B	128K	A100_80G_X4
Alibaba Qwen2-VL-2B-Instruct	Qwen/Qwen2-VL-2B-Instruct	Vision	2B	—	A100_80G_X1
Alibaba Qwen2-VL-7B-Instruct	Qwen/Qwen2-VL-7B-Instruct	Vision	7B	—	A100_80G_X1
Alibaba Qwen2-VL-72B-Instruct	Qwen/Qwen2-VL-72B-Instruct	Vision	72B	—	A100_80G_X4

DeepSeek Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
DeepSeek DeepSeek-R1-Distill-Qwen-32B	deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	Reasoning	32B	128K	A100_80G_X2

Google Gemma Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
Gemma Gemma 3 270M	google/gemma-3-270m-it	Chat	270M	128K	A100_80G_X1
Gemma Gemma 3 1B	google/gemma-3-1b-it	Chat	1B	128K	A100_80G_X1
Gemma Gemma 3 4B	google/gemma-3-4b-it	Vision	4B	128K	A100_80G_X1
Gemma Gemma 3 12B	google/gemma-3-12b-it	Vision	12B	128K	A100_80G_X1
Gemma Gemma 3 27B	google/gemma-3-27b-it	Vision	27B	128K	A100_80G_X2
Gemma Gemma 2 2B	google/gemma-2-2b-it	Chat	2B	8K	A100_80G_X1
Gemma Gemma 2 9B	google/gemma-2-9b-it	Chat	9B	8K	A100_80G_X1
Gemma Gemma 2 27B	google/gemma-2-27b-it	Chat	27B	8K	A100_80G_X2

Meta Llama Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
Meta Llama 4 Maverick 17B	meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	Vision	17B×128E	1M	H100_X8
Meta Llama 4 Scout 17B	meta-llama/Llama-4-Scout-17B-16E-Instruct	Vision	17B×16E	1M	H100_X4
Meta Llama 3.3 70B Instruct	meta-llama/Llama-3.3-70B-Instruct	Chat	70B	128K	A100_80G_X4
Meta Llama 3.2 3B Instruct	meta-llama/Llama-3.2-3B-Instruct	Chat	3B	128K	A100_80G_X1
Meta Llama 3.2 1B Instruct	meta-llama/Llama-3.2-1B-Instruct	Chat	1B	128K	A100_80G_X1
Meta Llama 3.1 8B Instruct	meta-llama/Llama-3.1-8B-Instruct	Chat	8B	128K	A100_80G_X1
Meta Llama 3 8B Instruct	meta-llama/Meta-Llama-3-8B-Instruct	Chat	8B	8K	A100_80G_X1
Meta Llama 3 70B Instruct	meta-llama/Meta-Llama-3-70B-Instruct	Chat	70B	8K	A100_80G_X4
Meta Llama 2 70B Chat	meta-llama/Llama-2-70b-chat-hf	Chat	70B	4K	A100_80G_X4
Meta Llama 2 13B Chat	meta-llama/Llama-2-13b-chat-hf	Chat	13B	4K	A100_80G_X1
Meta Llama 2 7B Chat	meta-llama/Llama-2-7b-chat-hf	Chat	7B	4K	A100_80G_X1

Microsoft Phi Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
Microsoft Phi-4	microsoft/phi-4	Chat	14B	16K	A100_80G_X1
Microsoft Phi-3 Vision 128K	microsoft/Phi-3-vision-128k-instruct	Vision	4.2B	128K	H100_X1
Microsoft Phi-3 Medium 128K	microsoft/Phi-3-medium-128k-instruct	Chat	14B	128K	A100_80G_X1
Microsoft Phi-3 Medium 4K	microsoft/Phi-3-medium-4k-instruct	Chat	14B	4K	A100_80G_X1
Microsoft Phi-3 Small 128K	microsoft/Phi-3-small-128k-instruct	Chat	7B	128K	A100_80G_X1
Microsoft Phi-3 Small 8K	microsoft/Phi-3-small-8k-instruct	Chat	7B	8K	A100_80G_X1
Microsoft Phi-3 Mini 128K	microsoft/Phi-3-mini-128k-instruct	Chat	3.8B	128K	A100_80G_X1
Microsoft Phi-3 Mini 4K	microsoft/Phi-3-mini-4k-instruct	Chat	3.8B	4K	A100_80G_X1

Mistral Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
Mistral Mixtral 8x7B Instruct v0.1	mistralai/Mixtral-8x7B-Instruct-v0.1	Chat	8×7B MoE	32K	A100_80G_X2
Mistral Mistral Nemo Instruct 2407	mistralai/Mistral-Nemo-Instruct-2407	Chat	12B	128K	A100_80G_X1
Mistral Mistral 7B Instruct v0.3	mistralai/Mistral-7B-Instruct-v0.3	Chat	7B	32K	A100_80G_X1
Mistral Mistral 7B Instruct v0.2	mistralai/Mistral-7B-Instruct-v0.2	Chat	7B	32K	A100_80G_X1
Mistral Mistral 7B Instruct v0.1	mistralai/Mistral-7B-Instruct-v0.1	Chat	7B	8K	A100_80G_X1
Mistral E5 Mistral 7B Instruct	intfloat/e5-mistral-7b-instruct	Embed	7B	32K	A10_X1

NVIDIA Nemotron Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
NVIDIA Nemotron 3 Super 120B	nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16	Chat	120B 12B active	1M	H100_X8
NVIDIA Nemotron 3 Nano 30B (FP8)	nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8	Chat	30B 3B active	1M	H100_X4
NVIDIA Nemotron 3 Nano 30B (BF16)	nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16	Chat	30B 3B active	1M	A100_80G_X1
NVIDIA Llama 3.1 Nemotron 70B Instruct	nvidia/Llama-3.1-Nemotron-70B-Instruct-HF	Chat	70B	128K	A100_80G_X4

OpenAI GptOss Family

Model Name	HuggingFace Model ID	Type	Params	Context	Cluster Shape
OpenAI GptOss 20B	openai/gpt-oss-20b	Chat	20B	128K	H100_X1
OpenAI GptOss 120B	openai/gpt-oss-120b	Chat	120B	128K	H100_X2

Use-Case Selection Guide

🔍 RAG / Document Search

🥇

Cohere Command A Reasoning

256K context, built for RAG, advanced reasoning over documents

Dedicated only

🥈

Cohere Command R 08-2024

RAG-optimized, 128K context, fine-tunable, cost-efficient

On-demand + Ded.

🥉

Google Gemini 2.5 Pro

1M context — handle entire documents in one pass

On-demand (ext.)

🤖 Agentic / Tool-Use Workflows

🥇

xAI Grok 4.20 Multi-Agent

Real-time multi-agent research: parallel specialist orchestration for web search, analysis & synthesis

On-demand (ext.)

🥈

xAI Grok 4.1 Fast

2M context, parallel tool calls, 3× fewer hallucinations

On-demand (ext.)

🥉

Cohere Command A

256K context, best throughput for Cohere agentic tasks

On-demand + Ded.

💻 Code Generation

🥇

xAI Grok Code Fast 1

Specialized coding model — plan, write, test, debug loop

On-demand (ext.)

🥈

Meta Llama 4 Maverick

MoE, strong coding + tool-calling capabilities

On-demand + Ded.

🥉

Google Gemini 2.5 Pro

Top-tier code reasoning, debugging, complex architecture

On-demand (ext.)

🌍 Multimodal (Text + Image)

🥇

Google Gemini 2.5 Pro

Best multimodal — text, image, code, audio, video

On-demand (ext.)

🥈

Cohere Command A Vision

Enterprise-focused image, chart, document understanding

On-demand + Ded.

🥉

Meta Llama 4 Maverick

Open-weight multimodal with MoE efficiency

On-demand + Ded.

⚡ High-Volume / Low-Latency

🥇

Google Gemini 2.5 Flash-Lite

Fastest + cheapest in Gemini family; 1M context

On-demand (ext.)

🥈

xAI Grok 3 Mini Fast

Lightweight thinker, lowest latency xAI model

On-demand (ext.)

🥉

OpenAI gpt-oss-20b

Consumer-grade hardware optimized, fast reasoning

On-demand + Ded.

🏢 Enterprise Fine-Tuning

🥇

Cohere Command R 08-2024

T-Few + Vanilla fine-tuning on dedicated AI clusters

Dedicated only

🥈

Meta Llama 3.3 70B

LoRA fine-tuning, best 70B text performance

On-demand + Ded.

🌐 Multilingual Applications

🥇

Cohere Command A

Native multilingual support, 256K context, high-throughput production

On-demand + Ded.

🥈

Meta Llama 4 Scout / Maverick

Open-weight multilingual models with strong cross-language performance

On-demand + Ded.

🔤

Cohere Embed Multilingual 3 / Embed 4

Semantic search in 100+ languages; Gen 4 adds multimodal

On-demand + Ded.

📄 Long-context Document Analysis

🥇

xAI Grok 4.1 Fast

2M tokens — process full codebases or entire document archives in one pass

On-demand (ext.)

🥈

Google Gemini 2.5 Pro

1M context with multimodal; ideal for large PDFs, reports, mixed media

On-demand (ext.)

🥉

xAI Grok 4 Fast

2M context at optimized cost; great for batch document workloads

On-demand (ext.)

🇪🇺 EU Data Sovereignty (Frankfurt · London)

On-demand + Dedicated — EU-FRA & UK-LON

🥇

Cohere Command A

256K context, multilingual, RAG + agentic — widest EU access

EU-FRA + UK-LON

🥈

Meta Llama 3.3 70B

Open-weight, LoRA fine-tunable, strong text performance

EU-FRA + UK-LON

🥉

OpenAI gpt-oss-120b / 20b

OpenAI-compatible API, advanced reasoning — Frankfurt on-demand + dedicated

EU-FRA + UK-LON

Dedicated Only — EU-FRA & UK-LON

🔒

Cohere Command A Reasoning

Advanced reasoning over documents — tenancy-exclusive GPUs in both EU regions

Dedicated · EU-FRA + UK-LON

🔒

Meta Llama 4 Maverick / Scout

Latest Llama 4 multimodal models — UK-LON dedicated only

Dedicated · UK-LON

🔤

Cohere Embed 3 / 4 + Rerank 3.5

Semantic search & reranking with EU data residency

Dedicated · EU-FRA + UK-LON

On-demand only (ext.) — EU-FRA

🌐

Google Gemini 2.5 Pro / Flash / Flash-Lite

External API routed through EU-FRA endpoint — data does not reside on OCI hardware

On-demand (ext.) · EU-FRA

⚠️

xAI Grok — No EU Presence

All xAI Grok models are US-only (Ashburn, Chicago, Phoenix) — not suitable for EU data residency requirements

Not available in EU

🏗️ Dedicated AI Clusters

🔒

Required for Fine-Tuning

Fine-tuning jobs run exclusively on dedicated GPU clusters — Cohere T-Few/Vanilla and Meta LoRA cannot run on-demand

Dedicated onlyCohere Command RLlama 3.3 70B

🏢

Data Residency & Compliance

Tenancy-exclusive GPUs; your data never shares hardware — suited for regulated industries (GDPR, HIPAA, financial)

DedicatedAll CohereAll MetaOpenAI gpt-oss

⚠️

Not Available — Google & xAI

Google Gemini and xAI Grok route through external APIs — dedicated clusters are not supported for these providers

On-demand (ext.) only

⚡ On-demand (No Cluster Needed)

🚀

Google Gemini & xAI Grok — Always On-demand

External API call; no cluster provisioning, instant availability, pay-per-token billing

On-demand (ext.)

🌍

Cohere & OpenAI gpt-oss — On-demand in Select Regions

On-demand access available without a dedicated cluster — ideal for PoC and variable workloads

On-demand + Ded.

📝

Fine-Tuning Requires Dedicated

On-demand mode supports inference only — to fine-tune a model you must provision a dedicated AI cluster first

Dedicated required

LEGEND

★ Flagship / Best-in-class

◉ Balanced / Advanced

▷ Speed / Efficiency tier

○ Lightweight / Budget

◆ Specialized

2M/1M Context ≥ 1M tokens

256K Context = 192K–512K tokens

128K Context = 128K tokens

✓ Feature supported

✗ Not supported

MoE Mixture of Experts (sparse activation)

US-CHI Region: on-demand + dedicated

EU-FRA Region: dedicated AI clusters only

US-ASH Region: on-demand / external call only

¹ Parameter counts are shown only when officially disclosed by the provider. Proprietary models (Google Gemini, xAI Grok) do not publish parameter counts and are omitted. "—" means not publicly disclosed.

² Fine-tuning on OCI uses dedicated AI clusters (GPU resources belonging exclusively to your tenancy). Cohere supports T-Few & Vanilla strategies; Meta Llama supports LoRA.

³ Retired/deprecated models (Command R 16K, Command R+, Llama 3 70B, Llama 3.1 70B) are omitted from the main tables.

⁴ Model Import feature (GA 2025) lets you bring your own LLMs from Hugging Face or OCI Object Storage.

⁶ OCI documents Grok 4 at 128K context and Grok 4 Fast, Grok 4.1 Fast, Grok 4.20, and Grok 4.20 Multi-Agent at 2M context.

⁵ Data sources (OCI Official Documentation; catalog last updated 17 April 2026; OC1 commercial regions only): Pretrained Models · Models by Region · Inferencing Modes · Model Import

    ⚠ AI-generated content — not an official Oracle document. This page was assembled with AI assistance from OCI public documentation. Data may contain errors or be out of date. Always verify against docs.oracle.com before making production decisions.
  

    ⌥ GitHub Repository
    { } models.json
    { } imported-models.json
    @ enricopesce