Key Takeaways
- Llama 3.1 405B model has 405 billion parameters
- Llama 3 70B model contains 70 billion parameters with 128K context length
- Llama 2 7B uses Grouped-Query Attention (GQA) with 8 query heads
- Llama 3 trained on 15 trillion tokens using 16K H100 GPUs
- Llama 3.1 405B trained on 3.8e25 FLOPs with custom data pipeline
- Llama 2 70B pre-trained on 2 trillion tokens
- Llama 3 achieved 86.0% on MMLU benchmark for 70B model
- Llama 3.1 405B scores 88.6% on MMLU 5-shot
- Llama 2 70B attains 68.9% on MMLU
- Llama 3 downloaded over 350 million times on Hugging Face in first month
- Llama 2 reached 1 billion downloads on Hugging Face by mid-2024
- Llama 3.1 models have 100M+ monthly active users via platforms
- Llama 3 70B outperforms GPT-3.5 on 7/9 benchmarks
- Llama 3.1 405B surpasses Llama 3 405B preview by 10% on MMLU
- Llama 2 70B beats PaLM 540B on 5 commonsense benchmarks
Llama AI stats cover key models, benchmarks, usage, and downloads.
Architecture and Parameters
Architecture and Parameters Interpretation
Community Adoption
Community Adoption Interpretation
Comparisons and Rankings
Comparisons and Rankings Interpretation
Evaluation Benchmarks
Evaluation Benchmarks Interpretation
Training Resources
Training Resources Interpretation
Sources & References
- Reference 1AIai.meta.comVisit source
- Reference 2HUGGINGFACEhuggingface.coVisit source
- Reference 3LLAMAllama.meta.comVisit source
- Reference 4ARXIVarxiv.orgVisit source
- Reference 5LMSYSlmsys.orgVisit source
- Reference 6ARENAarena.lmsys.orgVisit source
- Reference 7GITHUBgithub.comVisit source
- Reference 8KAGGLEkaggle.comVisit source






