About OllamaModels
OllamaModels.com helps you find the perfect local AI model for your hardware. We provide detailed system requirements, performance benchmarks, and personalized recommendations based on your computer's capabilities.
Hardware Detection
We detect your RAM, CPU cores, and GPU to provide accurate compatibility assessments.
Performance Data
Real-world tokens/second benchmarks for different hardware configurations.
Privacy First
All hardware detection happens in your browser. We don't collect or store your data.
How It Works
1. Hardware Detection: Using browser APIs like navigator.deviceMemory, navigator.hardwareConcurrency,
and WebGPU, we detect your system's capabilities directly in your browser.
2. Model Matching: We compare your hardware against each model's requirements for different quantization levels (Q4, Q8, FP16, etc.) to determine compatibility.
3. Performance Estimation: Based on your GPU tier and available memory, we estimate tokens/second you can expect for each model.
Understanding Quantization
| Quantization | Description | Trade-off |
|---|---|---|
| FP16 | Full 16-bit precision | Best quality, largest size |
| Q8_0 | 8-bit quantization | Near-original quality, ~50% size |
| Q5_K_M | 5-bit with K-quant optimization | Good quality, ~35% size |
| Q4_K_M | 4-bit with K-quant optimization | Balanced, ~25% size |
What is Ollama?
Ollama is a tool for running large language models locally on your computer. It simplifies the process of downloading, running, and managing LLMs without needing cloud services or API keys.
Quick start:
curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.2 ollama run llama3.2Disclaimer
OllamaModels.com is an independent project and is not affiliated with, endorsed by, or sponsored by Ollama, Meta, Google, Microsoft, Mistral AI, or any other model provider. Performance estimates are based on community benchmarks and may vary based on your specific hardware configuration, software versions, and system load.