About - OllamaModels

OllamaModels.com helps you find the perfect local AI model for your hardware. We provide detailed system requirements, performance benchmarks, and personalized recommendations based on your computer's capabilities.

Hardware Detection

We detect your RAM, CPU cores, and GPU to provide accurate compatibility assessments.

Performance Data

Real-world tokens/second benchmarks for different hardware configurations.

Privacy First

All hardware detection happens in your browser. We don't collect or store your data.

How It Works

1. Hardware Detection: Using browser APIs like navigator.deviceMemory, navigator.hardwareConcurrency, and WebGPU, we detect your system's capabilities directly in your browser.

2. Model Matching: We compare your hardware against each model's requirements for different quantization levels (Q4, Q8, FP16, etc.) to determine compatibility.

3. Performance Estimation: Based on your GPU tier and available memory, we estimate tokens/second you can expect for each model.

Understanding Quantization

Quantization	Description	Trade-off
FP16	Full 16-bit precision	Best quality, largest size
Q8_0	8-bit quantization	Near-original quality, ~50% size
Q5_K_M	5-bit with K-quant optimization	Good quality, ~35% size
Q4_K_M	4-bit with K-quant optimization	Balanced, ~25% size

What is Ollama?

Ollama is a tool for running large language models locally on your computer. It simplifies the process of downloading, running, and managing LLMs without needing cloud services or API keys.

Quick start:

curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.2 ollama run llama3.2

Disclaimer

OllamaModels.com is an independent project and is not affiliated with, endorsed by, or sponsored by Ollama, Meta, Google, Microsoft, Mistral AI, or any other model provider. Performance estimates are based on community benchmarks and may vary based on your specific hardware configuration, software versions, and system load.

Ollama Website Ollama GitHub

Made with for the local AI community