Find Your Perfect Local AI Model

Compare system requirements, performance benchmarks, and get personalized recommendations based on your hardware.

Your Hardware

Hardware detection not yet run

Showing 20 of 20 models

Code Llama

Meta 路 34B

Meta's specialized coding model based on Llama 2. Excellent for code completion and generation.

coding codingprogramminginfill
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 16K
Performance (tokens/sec)
5
CPU
28
Mid GPU
65
High GPU
ollama pull codellama

Command R

Cohere 路 35B

Cohere's model optimized for RAG and tool use. Excellent at following complex instructions.

general ragtool-useenterprise
Size: 20GB
RAM: 26GB
VRAM: 22GB
Context: 128K
Performance (tokens/sec)
5
CPU
26
Mid GPU
60
High GPU
ollama pull command-r

DeepSeek R1

DeepSeek 路 70B

State-of-the-art reasoning model with chain-of-thought capabilities. Excels at math, coding, and complex reasoning.

reasoning reasoningmathcoding
Size: 43GB
RAM: 48GB
VRAM: 44GB
Context: 64K
Performance (tokens/sec)
2.5
CPU
14
Mid GPU
42
High GPU
ollama pull deepseek-r1

DeepSeek R1 Distill (Qwen)

DeepSeek 路 32B

Distilled version of R1 based on Qwen. Great reasoning in a smaller package.

reasoning reasoningdistilledefficient
Size: 19GB
RAM: 24GB
VRAM: 20GB
Context: 32K
Performance (tokens/sec)
6
CPU
28
Mid GPU
65
High GPU
ollama pull deepseek-r1-distill-qwen

Gemma 2

Google 路 27B

Google's open model built from Gemini research. Available in 2B, 9B, and 27B sizes.

general googleefficientinstruct
Size: 16GB
RAM: 20GB
VRAM: 18GB
Context: 8K
Performance (tokens/sec)
7
CPU
32
Mid GPU
75
High GPU
ollama pull gemma2

Llama 3.2

Meta 路 3B

Efficient smaller models from Meta, perfect for on-device deployment. Available in 1B and 3B sizes.

small lightweightfastmobile
Size: 2GB
RAM: 4GB
VRAM: 3GB
Context: 128K
Performance (tokens/sec)
35
CPU
120
Mid GPU
200
High GPU
ollama pull llama3.2

Llama 3.2 Vision

Meta 路 11B

Multimodal model that can understand images and text. Available in 11B and 90B variants.

vision multimodalimage-understandingvision
Size: 6.5GB
RAM: 10GB
VRAM: 8GB
Context: 128K
Performance (tokens/sec)
12
CPU
45
Mid GPU
90
High GPU
ollama pull llama3.2-vision

Llama 3.3

Meta 路 70B

Meta's latest and most capable open model. Excellent for general tasks, coding, and reasoning with 128K context.

general chatinstructmultilingual
Size: 43GB
RAM: 48GB
VRAM: 44GB
Context: 128K
Performance (tokens/sec)
3
CPU
15
Mid GPU
45
High GPU
ollama pull llama3.3

LLaVA

LLaVA Team 路 13B

Visual instruction-tuned model combining CLIP vision with Llama. Great for image understanding.

vision visionmultimodalimage-understanding
Size: 8GB
RAM: 12GB
VRAM: 10GB
Context: 4K
Performance (tokens/sec)
10
CPU
40
Mid GPU
85
High GPU
ollama pull llava

Mistral 7B

Mistral AI 路 7B

Highly efficient 7B model that punches above its weight. Great balance of speed and capability.

general efficientfastinstruct
Size: 4.1GB
RAM: 8GB
VRAM: 6GB
Context: 32K
Performance (tokens/sec)
20
CPU
80
Mid GPU
150
High GPU
ollama pull mistral

Mistral Small

Mistral AI 路 24B

Latest Mistral model optimized for efficiency. Enterprise-grade quality in a compact size.

general enterpriseefficientfunction-calling
Size: 14GB
RAM: 18GB
VRAM: 16GB
Context: 32K
Performance (tokens/sec)
8
CPU
35
Mid GPU
80
High GPU
ollama pull mistral-small

Mixtral 8x7B

Mistral AI 路 8x7B (47B)

Mixture of Experts model that activates only 2 experts per token. Fast inference with high quality.

general moeefficientmultilingual
Size: 26GB
RAM: 32GB
VRAM: 28GB
Context: 32K
Performance (tokens/sec)
5
CPU
25
Mid GPU
60
High GPU
ollama pull mixtral

Mxbai Embed Large

Mixedbread AI 路 335M

State-of-the-art embedding model. Top performance on MTEB benchmarks.

embedding embeddingragretrieval
Size: 0.67GB
RAM: 2GB
VRAM: 1GB
Context: 1K
Performance (tokens/sec)
300
CPU
1500
Mid GPU
4000
High GPU
ollama pull mxbai-embed-large

Nomic Embed Text

Nomic AI 路 137M

High-quality text embedding model. Perfect for RAG, semantic search, and similarity matching.

embedding embeddingragsearch
Size: 0.27GB
RAM: 1GB
VRAM: 0.5GB
Context: 8K
Performance (tokens/sec)
500
CPU
2000
Mid GPU
5000
High GPU
ollama pull nomic-embed-text

Orca Mini

Pankaj Mathur 路 7B

Compact model with strong reasoning. Good balance between size and capability.

small reasoningefficientinstruct
Size: 4GB
RAM: 8GB
VRAM: 6GB
Context: 4K
Performance (tokens/sec)
22
CPU
85
Mid GPU
160
High GPU
ollama pull orca-mini

Phi-3

Microsoft 路 14B

Microsoft's small language model. Surprisingly capable for its size, great for resource-constrained environments.

small efficientsmallreasoning
Size: 8.2GB
RAM: 12GB
VRAM: 10GB
Context: 128K
Performance (tokens/sec)
15
CPU
55
Mid GPU
110
High GPU
ollama pull phi3

Qwen 2.5

Alibaba 路 72B

Alibaba's flagship model with excellent multilingual support. Available from 0.5B to 72B.

general multilingualchinesechat
Size: 44GB
RAM: 50GB
VRAM: 46GB
Context: 128K
Performance (tokens/sec)
2.8
CPU
14
Mid GPU
40
High GPU
ollama pull qwen2.5

Qwen 2.5 Coder

Alibaba 路 32B

Specialized coding model with excellent code completion and generation. Supports 92 programming languages.

coding codingprogrammingcode-completion
Size: 19GB
RAM: 24GB
VRAM: 20GB
Context: 128K
Performance (tokens/sec)
6
CPU
30
Mid GPU
70
High GPU
ollama pull qwen2.5-coder

StarCoder 2

BigCode 路 15B

Code LLM trained on The Stack v2. Excellent for code completion across 600+ programming languages.

coding codingcode-completionprogramming
Size: 9GB
RAM: 14GB
VRAM: 11GB
Context: 16K
Performance (tokens/sec)
12
CPU
50
Mid GPU
100
High GPU
ollama pull starcoder2

TinyLlama

TinyLlama 路 1.1B

Ultra-compact 1.1B model. Perfect for testing, edge devices, and resource-limited environments.

small tinyfastedge
Size: 0.64GB
RAM: 2GB
VRAM: 1GB
Context: 2K
Performance (tokens/sec)
60
CPU
200
Mid GPU
350
High GPU
ollama pull tinyllama
馃 OllamaModels.com

Find the best local AI models for your hardware. Not affiliated with Ollama.