OCI · GENERATIVE AI

Model Reference 2026

Source: OCI Official Documentation | March 2026 | 30+ models

Provider Context

No models match the selected filters.

5

Model Providers

24

Chat Models (Active)

9

Embedding Models

1

Rerank Model

Chat Models — Cohere Family

Model Name	Model ID	Tier	Parameters	Context Window	Multimodal	Fine-tunable	Tool Use / Agents	Reasoning	RAG Optimized	Status	Best For
Cohere Command A Reasoning	cohere.command-a-reasoning	★ Flagship Reasoning	111B	256K	✗	✗	✓	✓ Advanced	✓	● GA (Aug 2025)	Complex Q&A, multi-step reasoning, document analysis, structured arguments
Cohere Command A Vision	cohere.command-a-vision	★ Flagship Multimodal	112B	128K	✓ Images, Charts, Docs	✗	✓	✓	✓	● GA (Jul 2025)	Enterprise document understanding with charts & images
Cohere Command A	cohere.command-a-03-2025	◉ Flagship Chat	111B	256K	✗	✗	✓ Advanced	—	✓	● GA (Mar 2025)	Agentic enterprise tasks, RAG, multilingual, high-throughput production
Cohere Command R+ 08-2024	cohere.command-r-plus-08-2024	◉ Advanced	104B	128K	✗	✗	✓	—	✓	● Active	Complex specialized tasks, Q&A, sentiment, multilingual RAG
Cohere Command R 08-2024	cohere.command-r-08-2024	▷ Standard	35B	128K	✗	✓ T-Few / Vanilla	✓	—	✓ Optimized	● Active	RAG pipelines, info retrieval, cost-efficient enterprise chat

Chat Models — Google Gemini Family

Model Name	Model ID	Tier	Context Window	Multimodal	Thinking / Reasoning	Speed Profile	Fine-tunable on OCI	Status	Best For
Google Gemini 2.5 Pro	google.gemini-2.5-pro	★ Flagship	1M+	✓ Text, Image, Code, Audio, Video	✓ Advanced Reasoning	Deep / Accurate	✗	● GA	Most complex multimodal problems, large dataset analysis, SOTA reasoning tasks
Google Gemini 2.5 Flash	google.gemini-2.5-flash	◉ Balanced	1M	✓ Text, Image, Code, Audio, Video	✓ Thinking features	Fast + Smart	✗	● GA	Balanced workloads needing speed + intelligence, complex applications
Google Gemini 2.5 Flash-Lite	google.gemini-2.5-flash-lite	○ Budget / Fast	1M	✓ Text, Image, Code, Audio, Video	✗	Lowest Latency	✗	● GA	High-volume, simpler tasks; cost-sensitive production workloads

Chat Models — Meta Llama Family

Model Name	Model ID	Tier	Architecture	Total Params	Active Params (MoE)	Context Window	Multimodal	Fine-tunable (LoRA)	Tool Use	Status	Best For
Meta Llama 4 Maverick	meta.llama-4-maverick-17b-128e-instruct-fp8	★ Flagship MoE	MoE · 128 Experts	~400B	17B active	512K	✓ Text + Image	✗	✓	● GA (2025)	Multimodal understanding, multilingual, coding, agentic systems, large-scale inference
Meta Llama 4 Scout	meta.llama-4-scout-17b-16e-instruct	◉ Efficient MoE	MoE · 16 Experts	~109B	17B active	192K	✓ Text + Image	✗	✓	● GA (2025)	Smaller GPU deployments, efficient multimodal, multilingual, coding
Meta Llama 3.3 70B	meta.llama-3.3-70b-instruct	◉ Best Text 70B	Dense	70B	70B	128K	✗ Text only	✓ LoRA	✓	● GA	Best text-only 70B tasks; outperforms 3.1 70B and 3.2 90B on text benchmarks
Meta Llama 3.2 90B Vision	meta.llama-3.2-90b-vision-instruct	◉ Vision Flagship	Dense + Vision	90B	90B	128K	✓ Text + Image	✗	✓	● Active	Multimodal understanding with large model capacity, image reasoning
Meta Llama 3.2 11B Vision	meta.llama-3.2-11b-vision-instruct	▷ Compact Vision	Dense + Vision	11B	11B	128K	✓ Text + Image	✗	✓	● Active (Dedicated only)	Cost-efficient multimodal; resource-constrained deployments
Meta Llama 3.1 405B	meta.llama-3.1-405b-instruct	★ Largest Open	Dense	405B	405B	128K	✗ Text only	✗	✓	● Active	Highest text quality open model; complex reasoning, advanced generation

Chat Models — OpenAI gpt-oss Family

Model Name	Model ID	Tier	Parameters	Context Window	Reasoning / Agentic	Tool Use	Open Source	Fine-tunable on OCI	Status	Best For
OpenAI gpt-oss-120b	openai.gpt-oss-120b	★ Flagship OSS	120B	128K	✓ Advanced Reasoning + Agentic	✓ Advanced Tool Use	✓ Open weights	✗	● GA	Reasoning, agentic tasks; outperforms similar-size open models; OpenAI-compatible API
OpenAI gpt-oss-20b	openai.gpt-oss-20b	▷ Efficient OSS	20B	128K	✓ Reasoning + Agentic	✓	✓ Open weights	✗	● GA	Efficient consumer-hardware-optimized reasoning; agentic tasks at lower cost

Chat Models — xAI Grok Family

Model Name	Model ID	Tier	Context Window	Multimodal	Thinking / Chain-of-Thought	Coding Focus	Speed Profile	Domain Knowledge	Agentic	Status	Best For
xAI Grok 4	xai.grok-4	★ Flagship	128K	✓ Text + Image	✓	✓	Standard	✓ Finance, Health, Law, Science	✓	● GA	Enterprise data extraction, coding, summarization with deep domain knowledge
xAI Grok 4 Fast	xai.grok-4-fast-reasoning xai.grok-4-fast-non-reasoning	▷ Fast Flagship	2M	✓ Text + Image	✓	✓	Speed Optimized	✓ Finance, Health, Law, Science	✓	● GA	Same capability as Grok 4 with 2M context; cost-speed-optimized production
xAI Grok 4.1 Fast	xai.grok-4-1-fast-reasoning xai.grok-4-1-fast-non-reasoning	★ Agentic Flagship	2M !	✓ Text + Image	✓ Advanced	✓	High Speed	✓	✓ Parallel Tool Calling	● GA	Complex agentic systems, customer support, research — 3x fewer hallucinations vs Grok 4
xAI Grok 3	xai.grok-3	◉ Standard	131K	✗	✗	✓	Standard	✓ Finance, Health, Law, Science	✓	● GA	General enterprise tasks, data extraction, text summarization
xAI Grok 3 Fast	xai.grok-3-fast	▷ Standard Fast	131K	✗	✗	✓	Speed Optimized	✓	✓	● GA	High-throughput enterprise tasks at Grok 3 quality level
xAI Grok 3 Mini	xai.grok-3-mini	○ Lightweight Thinker	131K	✗	✓ Traces exposed	✓	Standard	—	✓	● GA	Logic-based tasks not requiring deep domain knowledge; transparent thinking traces
xAI Grok 3 Mini Fast	xai.grok-3-mini-fast	○ Lightweight Fast	131K	✗	✓ Traces exposed	✓	Fastest	—	✓	● GA	Low-latency logic tasks at minimum cost
xAI Grok Code Fast 1	xai.grok-code-fast-1	◆ Coding Specialist	256K	✗	✓ Summarized traces	✓ Specialized	Very Fast	—	✓ Agentic Coding	● GA (Aug 2025)	TypeScript, Python, Java, Rust, C++, Go; zero-to-one projects, bug fixes, agentic coding loops

Embedding Models

Model Name	Model ID	Generation	Multimodal (Image)	Language Scope	Size Variant	Use Case	Status
Cohere Embed 4	cohere.embed-v4.0	Gen 4 · Latest	✓ Text + Image (base64)	Multilingual	Full	Latest multimodal embeddings; text & image semantic search	● Active
Cohere Embed English Image 3	cohere.embed-english-image-v3.0	Gen 3	✓ Text + Image	English	Full	English-only text+image semantic search	● Active
Cohere Embed English Light Image 3	cohere.embed-english-light-image-v3.0	Gen 3	✓ Text + Image	English	Light	Cost-efficient English text+image embeddings	● Active
Cohere Embed Multilingual Image 3	cohere.embed-multilingual-image-v3.0	Gen 3	✓ Text + Image	Multilingual	Full	Global multilingual text+image semantic search	● Active
Cohere Embed Multilingual Light Image 3	cohere.embed-multilingual-light-image-v3.0	Gen 3	✓ Text + Image	Multilingual	Light	Budget multilingual text+image embeddings	● Active
Cohere Embed English 3	cohere.embed-english-v3.0	Gen 3	✗ Text only	English	Full	Pure text English semantic search, classification, clustering	● Active
Cohere Embed English Light 3	cohere.embed-english-light-v3.0	Gen 3	✗ Text only	English	Light	Cost-efficient English text embeddings at scale	● Active
Cohere Embed Multilingual 3	cohere.embed-multilingual-v3.0	Gen 3	✗ Text only	Multilingual	Full	Global enterprise text semantic search in 100+ languages	● Active
Cohere Embed Multilingual Light 3	cohere.embed-multilingual-light-v3.0	Gen 3	✗ Text only	Multilingual	Light	Affordable multilingual text embeddings at volume	● Active

Rerank Model

Model Name	Model ID	Input	Output	Use Case	Status
Cohere Rerank 3.5	cohere.rerank.v3-5	Query + List of texts	Ordered array with relevance scores	RAG pipelines, document ranking, search result reordering, precision improvement	● Active

Use-Case Selection Guide

🔍 RAG / Document Search

🥇

Cohere Command A Reasoning

256K context, built for RAG, advanced reasoning over documents

🥈

Cohere Command R 08-2024

RAG-optimized, 128K context, fine-tunable, cost-efficient

🥉

Google Gemini 2.5 Pro

1M+ context — handle entire documents in one pass

🤖 Agentic / Tool-Use Workflows

🥇

xAI Grok 4.1 Fast

2M context, parallel tool calls, 3x lower hallucinations

🥈

Cohere Command A

256K context, best throughput for Cohere agentic tasks

🥉

OpenAI gpt-oss-120b

Advanced tool use, reasoning, OpenAI-compatible API

💻 Code Generation

🥇

xAI Grok Code Fast 1

Specialized coding model — plan, write, test, debug loop

🥈

Meta Llama 4 Maverick

MoE, strong coding + tool-calling capabilities

🥉

Google Gemini 2.5 Pro

Top-tier code reasoning, debugging, complex architecture

🌍 Multimodal (Text + Image)

🥇

Google Gemini 2.5 Pro

Best multimodal — text, image, code, audio, video

🥈

Cohere Command A Vision

Enterprise-focused image, chart, document understanding

🥉

Meta Llama 4 Maverick

Open-weight multimodal with MoE efficiency

⚡ High-Volume / Low-Latency

🥇

Google Gemini 2.5 Flash-Lite

Fastest + cheapest in Gemini family; 1M context

🥈

xAI Grok 3 Mini Fast

Lightweight thinker, lowest latency xAI model

🥉

OpenAI gpt-oss-20b

Consumer-grade hardware optimized, fast reasoning

🏢 Enterprise Fine-Tuning

🥇

Cohere Command R 08-2024

T-Few + Vanilla fine-tuning on dedicated AI clusters

🥈

Meta Llama 3.3 70B

LoRA fine-tuning, best 70B text performance

🌐 Multilingual Applications

🥇

Cohere Command A

Native multilingual support, 256K context, high-throughput production

🥈

Meta Llama 4 Scout / Maverick

Open-weight multilingual models with strong cross-language performance

🔤

Cohere Embed Multilingual 3 / Embed 4

Semantic search in 100+ languages; Gen 4 adds multimodal

📄 Long-context Document Analysis

🥇

xAI Grok 4.1 Fast

2M tokens — process full codebases or entire document archives in one pass

🥈

Google Gemini 2.5 Pro

1M+ context with multimodal; ideal for large PDFs, reports, mixed media

🥉

xAI Grok 4 Fast

2M context at optimized cost; great for batch document workloads

LEGEND

★ Flagship / Best-in-class

◉ Balanced / Advanced

▷ Speed / Efficiency tier

○ Lightweight / Budget

◆ Specialized

2M/1M+ Context ≥ 1M tokens

256K Context = 192K–512K tokens

128K Context = 128K tokens

✓ Feature supported

✗ Not supported

MoE Mixture of Experts (sparse activation)

¹ Parameter counts are shown only when officially disclosed by the provider. Proprietary models (Google Gemini, xAI Grok) do not publish parameter counts and are omitted. "—" means not publicly disclosed.

² Fine-tuning on OCI uses dedicated AI clusters (GPU resources belonging exclusively to your tenancy). Cohere supports T-Few & Vanilla strategies; Meta Llama supports LoRA.

³ Retired/deprecated models (Command R legacy, Llama 3 70B, Llama 3.1 70B) are omitted from the main tables.

⁴ Model Import feature (GA 2025) lets you bring your own LLMs from Hugging Face or OCI Object Storage.

⁶ Grok 4 Fast and Grok 4.1 Fast each expose two OCI model IDs: a -reasoning variant (thinking tokens, chain-of-thought) and a -non-reasoning variant (instant responses, no thinking tokens).

⁵ Data source: OCI Official Documentation — docs.oracle.com/en-us/iaas/Content/generative-ai/ — March 2026.

    ⌥ GitHub Repository
    @ enricopesce