Model Comparison Chart

Sort by model size, VRAM minimum, or context window.

				Use Cases
Llama 3.2 1B	1B	2GB	128,000	edge, rewrite, assistant-lite
Phi-3 Mini	3.8B	3GB	4,000	edge, chat, reasoning, low-latency
Llama 3.2 3B	3B	3GB	128,000	chat, summary, edge, tool-use
Mistral 7B	7B	5GB	32,000	chat, instruction-following, function-calling
DeepSeek Coder 6.7B	6.7B	5GB	16,000	coding, code-review, code-completion
Qwen 2.5 7B	7B	5GB	32,000	chat, reasoning, multilingual, json
Neural Chat 7B	7B	5GB	32,000	chat, assistant, customer-support
Yi 6B	6B	5GB	4,096	bilingual, chat, general
Llama 3.1 8B	8B	6GB	128,000	chat, general, instruction-following, rag
Qwen 2.5 Coder 7B	7.6B	6GB	32,768	coding, refactor, test-generation, agents
Gemma 2 9B	9B	7GB	8,192	chat, analysis, summarization
Mistral NeMo 12B	12B	8GB	128,000	chat, coding, reasoning, long-context
Solar 10.7B	10.7B	8GB	4,096	chat, single-turn, general
Phi-3 Medium	14B	10GB	128,000	reasoning, chat, analysis
Qwen 2.5 14B	14B	10GB	32,000	reasoning, agent, multilingual, json
Codestral 22B	22B	16GB	32,768	coding, fim, repository-analysis
Gemma 2 27B	27B	20GB	8,192	analysis, reasoning, enterprise-chat
DeepSeek Coder 33B	33B	22GB	16,000	coding, repository-analysis, test-generation
Yi 34B	34B	24GB	4,096	analysis, bilingual, knowledge
Llama 3.1 70B	70B	40GB	128,000	advanced-reasoning, agent, analysis, multilingual