Key Takeaways
- Llama 2 7B model achieves 63.9% accuracy on MMLU benchmark
- Llama 2 13B scores 67.5% on MMLU
- Llama 2 70B reaches 68.9% on MMLU
- Llama 2 7B was trained on 2 trillion tokens of data
- Llama 3 models trained on over 15 trillion tokens
- Llama 3.1 405B required 16.4 million GPU hours on H100s
- Llama 2 7B has 7 billion parameters
- Llama 3 8B features 8 billion parameters with 32 layers
- Llama 2 70B uses 80 layers and 8192 hidden size
- Llama 2 7B model downloaded over 1 billion times on Hugging Face
- Llama 3 models collectively have 3.5 billion downloads on HF
- Llama 2 70B used in over 1000 fine-tuned models on HF
- Llama 3 70B achieves 50 tokens/sec on single A100 GPU inference
- Llama 3 8B quantized to 4-bit runs at 100+ tokens/sec on consumer GPU
- Llama 2 70B requires 140GB VRAM in FP16
Llama models show strong performance, training stats, and wide adoption.
Architecture and Parameters
Architecture and Parameters Interpretation
Comparisons with Other Models
Comparisons with Other Models Interpretation
Inference and Deployment
Inference and Deployment Interpretation
Performance on Benchmarks
Performance on Benchmarks Interpretation
Training Data and Compute
Training Data and Compute Interpretation
Usage and Adoption Metrics
Usage and Adoption Metrics Interpretation
Sources & References
- Reference 1AIai.meta.comVisit source
- Reference 2LLAMAllama.meta.comVisit source
- Reference 3ARENAarena.lmsys.orgVisit source
- Reference 4HUGGINGFACEhuggingface.coVisit source
- Reference 5GITHUBgithub.comVisit source
- Reference 6DOCSdocs.vllm.aiVisit source
- Reference 7ML-EXPLOREml-explore.github.ioVisit source






