Key Takeaways
- Llama 3.1 405B model has 405 billion parameters
- Llama 3 70B model contains 70 billion parameters with 128K context length
- Llama 2 7B uses Grouped-Query Attention (GQA) with 8 query heads
- Llama 3 trained on 15 trillion tokens using 16K H100 GPUs
- Llama 3.1 405B trained on 3.8e25 FLOPs with custom data pipeline
- Llama 2 70B pre-trained on 2 trillion tokens
- Llama 3 achieved 86.0% on MMLU benchmark for 70B model
- Llama 3.1 405B scores 88.6% on MMLU 5-shot
- Llama 2 70B attains 68.9% on MMLU
- Llama 3 downloaded over 350 million times on Hugging Face in first month
- Llama 2 reached 1 billion downloads on Hugging Face by mid-2024
- Llama 3.1 models have 100M+ monthly active users via platforms
- Llama 3 70B outperforms GPT-3.5 on 7/9 benchmarks
- Llama 3.1 405B surpasses Llama 3 405B preview by 10% on MMLU
- Llama 2 70B beats PaLM 540B on 5 commonsense benchmarks
Llama AI stats cover key models, benchmarks, usage, and downloads.
Architecture and Parameters
- Llama 3.1 405B model has 405 billion parameters
- Llama 3 70B model contains 70 billion parameters with 128K context length
- Llama 2 7B uses Grouped-Query Attention (GQA) with 8 query heads
- Llama 3.1 8B has 8 billion parameters and supports 128K token context
- Code Llama 34B is based on Llama 2 with 34 billion parameters specialized for code
- Llama 3.2 1B model has 1 billion parameters for edge devices
- Llama Guard 3 8B uses 8B parameters for safety classification
- Llama 3.1 70B employs SwiGLU activation and rotary positional embeddings
- Llama 2 70B has 70 layers and hidden size of 8192
- Llama 3 8B features 32 layers and 4096 hidden dimension
- Llama 3.2 90B vision model integrates vision encoder with 90B parameters
- Llama 1 65B has 80 layers and uses RMSNorm preprocessing
- Llama 3.1 405B uses 126 layers with intermediate size 16384*8
- Llama 2 13B employs 40 layers and 5120 hidden size
- Llama Guard 2 7B has 7B parameters for content moderation
- Llama 3.2 11B multimodal has 11B parameters including vision tower
- Code Llama 7B Python variant fine-tuned on 100B Python tokens
- Llama 3 70B supports function calling with 70B params
- Llama 1 7B has 32 layers and 4096 hidden size
- Llama 3.1 8B-Instruct has instruction-tuned architecture on 8B base
- Llama 2 70B-Instruct uses supervised fine-tuning on 1M examples
- Llama 3.2 1B vision model optimized for 3B FLOPs inference
- Llama Guard 3 70B scales to 70B for advanced safety
- Llama 3 405B preview had 405B parameters announced in 2024
Architecture and Parameters Interpretation
Community Adoption
- Llama 3 downloaded over 350 million times on Hugging Face in first month
- Llama 2 reached 1 billion downloads on Hugging Face by mid-2024
- Llama 3.1 models have 100M+ monthly active users via platforms
- Code Llama starred 10K+ times on GitHub repositories
- Llama 3 fine-tunes hosted exceed 50K on Hugging Face Hub
- Llama 2 used in 40K+ commercial applications per Meta reports
- Llama Guard integrated in 5K+ safety pipelines on HF Spaces
- Llama 3.2 edge models deployed on 1M+ Android devices targeted
- Llama models forked 200K+ times on Hugging Face platform
- Llama 3 inference requests hit 100B+ on Grok and others
- Code Llama used in 20% of top Kaggle competitions 2024
- Llama 2 community fine-tunes exceed 100K variants
- Llama 3.1 405B quantized versions downloaded 10M+ times
- Llama Guard 3 adopted by 500+ AI safety research papers
- Llama 3 ranks top 3 in 80% of HF Open LLM Leaderboard categories
- Llama 2 monthly downloads peaked at 50M in Q4 2023
- Llama 3.2 vision demos viewed 1M+ on HF Spaces
- Llama models contribute to 15% of all HF model inferences
- Llama 3.1 used in 10K+ enterprise pilots reported
Community Adoption Interpretation
Comparisons and Rankings
- Llama 3 70B outperforms GPT-3.5 on 7/9 benchmarks
- Llama 3.1 405B surpasses Llama 3 405B preview by 10% on MMLU
- Llama 2 70B beats PaLM 540B on 5 commonsense benchmarks
- Code Llama 70B exceeds GPT-4 on MultiPL-E coding benchmark
- Llama 3 8B competitive with Mistral 7B on most evals
- Llama 3.1 70B ahead of Claude 3 Opus on GPQA by 5 points
- Llama 1 65B matches Chinchilla 70B performance at half compute
- Llama 3.2 90B beats GPT-4V on 3/5 vision-language tasks
- Llama Guard outperforms OpenAI moderation on safety benchmarks
- Llama 3 70B #2 on HF Open LLM Leaderboard behind only 405B preview
- Llama 2 70B-Instruct beats Vicuna 33B on MT-Bench by 4%
- Code Llama 34B surpasses StarCoder 15B on coding evals
- Llama 3.1 8B faster than Phi-3 mini at same quality
- Llama 3 ranks higher than Gemini 1.5 on LMSYS Arena coding
- Llama 2 7B outperforms BLOOM 7B on multilingual tasks
- Llama 3.2 11B multimodal tops Phi-3.5-vision on efficiency
- Llama Guard 3 safer than Llama 2 base by 20% violation reduction
- Llama 3.1 405B closes gap to GPT-4o on reasoning by 2%
- Llama 3 70B more parameter-efficient than Mixtral 8x7B
- Llama 2 beats Jurassic-1 on instruction following evals
Comparisons and Rankings Interpretation
Evaluation Benchmarks
- Llama 3 achieved 86.0% on MMLU benchmark for 70B model
- Llama 3.1 405B scores 88.6% on MMLU 5-shot
- Llama 2 70B attains 68.9% on MMLU
- Code Llama 70B achieves 53.7% on HumanEval pass@1
- Llama 3 8B scores 82.0% on GSM8K math benchmark
- Llama 1 65B reaches 63.7% on HellaSwag
- Llama 3.1 70B gets 73.0% on GPQA Diamond benchmark
- Llama 2 7B-Instruct scores 62.3% on MMLU
- Llama 3.2 11B vision achieves 72.5% on ChartQA
- Llama Guard 3 detects 85% of jailbreak attacks in safety eval
- Llama 3 70B scores 91.5% on HumanEval coding benchmark
- Code Llama 34B Python gets 55.4% on MBPP pass@1
- Llama 3.1 8B reaches 66.7% on ARC-Challenge
- Llama 2 70B-Instruct achieves 69.9% on MT-Bench
- Llama 3 8B scores 68.4% on TriviaQA
- Llama 1 13B gets 57.8% on PIQA commonsense
- Llama 3.2 90B scores 84.7% on MMMU vision benchmark
- Llama Guard 2 blocks 90% of unsafe prompts in internal evals
- Llama 3.1 405B attains 96.8% on GSM8K math reasoning
- Llama 3 70B ranks #1 open model on LMSYS Chatbot Arena
- Llama 3.1 405B Elo rating 1288 on LMSYS Arena
- Llama 2 70B Elo 1120 on Chatbot Arena leaderboard
Evaluation Benchmarks Interpretation
Training Resources
- Llama 3 trained on 15 trillion tokens using 16K H100 GPUs
- Llama 3.1 405B trained on 3.8e25 FLOPs with custom data pipeline
- Llama 2 70B pre-trained on 2 trillion tokens
- Code Llama 70B continued pretraining on 500B code tokens
- Llama 3 used 24K GPU hours for post-training alignment
- Llama 1 65B trained on 1.4 trillion tokens with public sources
- Llama 3.1 8B fine-tuned with 10M synthetic preference pairs
- Llama 2 instruction tuning used 1M human preference annotations
- Llama 3.2 lightweight models trained on mobile-optimized datasets
- Code Llama Python trained on 100B Python tokens specifically
- Llama Guard trained on 1M safety examples across 14 categories
- Llama 3 pretraining spanned 15T tokens over 8 languages
- Llama 3.1 used data cutoff of March 2024 with filtering for quality
- Llama 2 7B trained in 21 days on 384 A100 GPUs
- Llama 3 RLHF involved 10K human annotators indirectly
- Code Llama 7B fine-tuned with long-context code data up to 100K tokens
- Llama 1 used CommonCrawl, C4, GitHub data totaling 1T+ tokens
- Llama 3.2 vision models trained on 400M image-text pairs
- Llama Guard 3 uses multilingual safety data for 20+ languages
- Llama 2 post-training used rejection sampling for alignment
- Llama 3.1 405B required 16K H100s for 30B token-hours training
Training Resources Interpretation
Sources & References
- Reference 1AIai.meta.comVisit source
- Reference 2HUGGINGFACEhuggingface.coVisit source
- Reference 3LLAMAllama.meta.comVisit source
- Reference 4ARXIVarxiv.orgVisit source
- Reference 5LMSYSlmsys.orgVisit source
- Reference 6ARENAarena.lmsys.orgVisit source
- Reference 7GITHUBgithub.comVisit source
- Reference 8KAGGLEkaggle.comVisit source






