OCI · GENERATIVE AI

OCI GenAI Catalog

Source: OCI Official Documentation  |  Updated 13 Mar 2026  |  30+ models

Provider Context
No models match the selected filters.
5
Model Providers
24
Chat Models (Active)
9
Embedding Models
1
Rerank Model
Chat Models — Cohere Family
Model Name Model ID Tier Parameters Context Window Multimodal Fine-tunable Tool Use / Agents Reasoning RAG Optimized Status Regions Best For
Chat Models — Google Gemini Family
Model Name Model ID Tier Context Window Multimodal Thinking / Reasoning Speed Profile Fine-tunable on OCI Status Regions Best For
Chat Models — Meta Llama Family
Model Name Model ID Tier Architecture Total Params Active Params (MoE) Context Window Multimodal Fine-tunable (LoRA) Tool Use Status Regions Best For
Chat Models — OpenAI gpt-oss Family
Model Name Model ID Tier Parameters Context Window Reasoning / Agentic Tool Use Open Source Fine-tunable on OCI Status Regions Best For
Chat Models — xAI Grok Family
Model Name Model ID Tier Context Window Multimodal Thinking / Chain-of-Thought Coding Focus Speed Profile Domain Knowledge Agentic Status Regions Best For
Embedding Models
Model Name Model ID Generation Multimodal (Image) Language Scope Size Variant Use Case Status Regions
Rerank Model
Model Name Model ID Input Output Use Case Status Regions
Use-Case Selection Guide

🔍 RAG / Document Search

🥇
Cohere Command A Reasoning
256K context, built for RAG, advanced reasoning over documents
Dedicated only
🥈
Cohere Command R 08-2024
RAG-optimized, 128K context, fine-tunable, cost-efficient
On-demand + Ded.
🥉
Google Gemini 2.5 Pro
1M+ context — handle entire documents in one pass
On-demand (ext.)

🤖 Agentic / Tool-Use Workflows

🥇
xAI Grok 4.1 Fast
2M context, parallel tool calls, 3x lower hallucinations
On-demand (ext.)
🥈
Cohere Command A
256K context, best throughput for Cohere agentic tasks
On-demand + Ded.
🥉
OpenAI gpt-oss-120b
Advanced tool use, reasoning, OpenAI-compatible API
On-demand + Ded.

💻 Code Generation

🥇
xAI Grok Code Fast 1
Specialized coding model — plan, write, test, debug loop
On-demand (ext.)
🥈
Meta Llama 4 Maverick
MoE, strong coding + tool-calling capabilities
On-demand + Ded.
🥉
Google Gemini 2.5 Pro
Top-tier code reasoning, debugging, complex architecture
On-demand (ext.)

🌍 Multimodal (Text + Image)

🥇
Google Gemini 2.5 Pro
Best multimodal — text, image, code, audio, video
On-demand (ext.)
🥈
Cohere Command A Vision
Enterprise-focused image, chart, document understanding
On-demand + Ded.
🥉
Meta Llama 4 Maverick
Open-weight multimodal with MoE efficiency
On-demand + Ded.

⚡ High-Volume / Low-Latency

🥇
Google Gemini 2.5 Flash-Lite
Fastest + cheapest in Gemini family; 1M context
On-demand (ext.)
🥈
xAI Grok 3 Mini Fast
Lightweight thinker, lowest latency xAI model
On-demand (ext.)
🥉
OpenAI gpt-oss-20b
Consumer-grade hardware optimized, fast reasoning
On-demand + Ded.

🏢 Enterprise Fine-Tuning

🥇
Cohere Command R 08-2024
T-Few + Vanilla fine-tuning on dedicated AI clusters
Dedicated only
🥈
Meta Llama 3.3 70B
LoRA fine-tuning, best 70B text performance
On-demand + Ded.

🌐 Multilingual Applications

🥇
Cohere Command A
Native multilingual support, 256K context, high-throughput production
On-demand + Ded.
🥈
Meta Llama 4 Scout / Maverick
Open-weight multilingual models with strong cross-language performance
On-demand + Ded.
🔤
Cohere Embed Multilingual 3 / Embed 4
Semantic search in 100+ languages; Gen 4 adds multimodal
On-demand + Ded.

📄 Long-context Document Analysis

🥇
xAI Grok 4.1 Fast
2M tokens — process full codebases or entire document archives in one pass
On-demand (ext.)
🥈
Google Gemini 2.5 Pro
1M+ context with multimodal; ideal for large PDFs, reports, mixed media
On-demand (ext.)
🥉
xAI Grok 4 Fast
2M context at optimized cost; great for batch document workloads
On-demand (ext.)

🇪🇺 EU Data Sovereignty (Frankfurt · London)

On-demand + Dedicated — EU-FRA & EU-LON
🥇
Cohere Command A
256K context, multilingual, RAG + agentic — widest EU access
EU-FRA + EU-LON
🥈
Meta Llama 3.3 70B
Open-weight, LoRA fine-tunable, strong text performance
EU-FRA + EU-LON
🥉
OpenAI gpt-oss-120b / 20b
OpenAI-compatible API, advanced reasoning — Frankfurt on-demand + dedicated
EU-FRA + EU-LON
Dedicated Only — EU-FRA & EU-LON
🔒
Cohere Command A Reasoning
Advanced reasoning over documents — tenancy-exclusive GPUs in both EU regions
Dedicated · EU-FRA + EU-LON
🔒
Meta Llama 4 Maverick / Scout
Latest Llama 4 multimodal models — EU-LON dedicated only
Dedicated · EU-LON
🔤
Cohere Embed 3 / 4 + Rerank 3.5
Semantic search & reranking with EU data residency
Dedicated · EU-FRA + EU-LON
On-demand only (ext.) — EU-FRA
🌐
Google Gemini 2.5 Pro / Flash / Flash-Lite
External API routed through EU-FRA endpoint — data does not reside on OCI hardware
On-demand (ext.) · EU-FRA
⚠️
xAI Grok — No EU Presence
All xAI Grok models are US-only (Ashburn, Chicago, Phoenix) — not suitable for EU data residency requirements
Not available in EU

🏗️ Dedicated AI Clusters

🔒
Required for Fine-Tuning
Fine-tuning jobs run exclusively on dedicated GPU clusters — Cohere T-Few/Vanilla and Meta LoRA cannot run on-demand
Dedicated onlyCohere Command RLlama 3.3 70B
🏢
Data Residency & Compliance
Tenancy-exclusive GPUs; your data never shares hardware — suited for regulated industries (GDPR, HIPAA, financial)
DedicatedAll CohereAll MetaOpenAI gpt-oss
⚠️
Not Available — Google & xAI
Google Gemini and xAI Grok route through external APIs — dedicated clusters are not supported for these providers
On-demand (ext.) only

⚡ On-demand (No Cluster Needed)

🚀
Google Gemini & xAI Grok — Always On-demand
External API call; no cluster provisioning, instant availability, pay-per-token billing
On-demand (ext.)
🌍
Cohere & OpenAI gpt-oss — On-demand in Select Regions
On-demand access available without a dedicated cluster — ideal for PoC and variable workloads
On-demand + Ded.
📝
Fine-Tuning Requires Dedicated
On-demand mode supports inference only — to fine-tune a model you must provision a dedicated AI cluster first
Dedicated required
LEGEND
Flagship / Best-in-class
Balanced / Advanced
Speed / Efficiency tier
Lightweight / Budget
Specialized
2M/1M+ Context ≥ 1M tokens
256K Context = 192K–512K tokens
128K Context = 128K tokens
Feature supported
Not supported
MoE Mixture of Experts (sparse activation)
US-CHI Region: on-demand + dedicated
EU-FRA Region: dedicated AI clusters only
US-ASH Region: on-demand / external call only

¹ Parameter counts are shown only when officially disclosed by the provider. Proprietary models (Google Gemini, xAI Grok) do not publish parameter counts and are omitted. "—" means not publicly disclosed.

² Fine-tuning on OCI uses dedicated AI clusters (GPU resources belonging exclusively to your tenancy). Cohere supports T-Few & Vanilla strategies; Meta Llama supports LoRA.

³ Retired/deprecated models (Command R legacy, Llama 3 70B, Llama 3.1 70B) are omitted from the main tables.

⁴ Model Import feature (GA 2025) lets you bring your own LLMs from Hugging Face or OCI Object Storage.

⁶ Grok 4.1 Fast context window (2M tokens) is confirmed by OCI documentation. Grok 4 and Grok 4 Fast context windows are not disclosed in OCI official docs.

⁵ Data sources (OCI Official Documentation, last updated 13 March 2026): Pretrained Models · Models by Region · Inferencing Modes · Model Import

AI-generated content — not an official Oracle document. This page was assembled with AI assistance from OCI public documentation. Data may contain errors or be out of date. Always verify against docs.oracle.com before making production decisions.
⌥ GitHub Repository { } models.json @ enricopesce